CMAT: A Cross-Model Adversarial Texture for Scanned Document Privacy Protection

Abstract

With the development of deep learning-based text detection and recognition, extracting text from smartphone-captured or scanned images has become effortless. However, this convenience gives rise to privacy concerns, especially for individuals and organizations unwilling to let their documents be read and analyzed by deep learning models.

We propose a novel solution for scanned document privacy protection. First, we propose a cross-model universal adversarial texture that mimics natural paper textures to safeguard scanned document images against multiple text detection models. Second, we curate a comprehensive dataset of scanned financial documents, including line-level and character-level text annotations for both English and Chinese documents, thereby providing diverse data for evaluating our protection method.

Experimental results indicate a significant reduction in H-mean scores of various text detection models, demonstrating the efficacy of our cross-model universal adversarial texture in preserving document privacy.

Method

Our pipeline to craft the cross-model adversarial texture.

We use Toroidal Cropping(TC) in generating and optimizing arbitrary size adv texture and leverage the Tree-structured Parzen Estimator Approach(TPE) to search for optimal model weights. More details please refer to our paper.

Dataset

Scanned page sample with annotations in our dataset.

Our dataset, AdvFinDocument, is collected from publicly available business annual reports, characterized by the form of financial sensitive documents.

It contains both English and Chinese scanned PDF pages, elaborately labeled with character-level and line-level text detection bounding box annotations, enabling it to be a benchmark for future work's efficiency testing.

Experiment

Cross-model attack.

Results show a balanced and efficient attack to three text detection models on both the benchmark dataset FUNSD and our proposed AdvFinDocument.

Attack ε

We explore the effect of textures with different degrees of perturbation on PSENet textline object detection. Results show even the perturbation is very small, the proposed adversarial texture can still hide the majority of the text, revealing the effectiveness of the proposed method.

Clean

ε=0.05

ε=0.06

ε=0.07

ε=0.08

ε=0.09

Model Weight

With the inclusion of weights, the attack performance on all three models became more balanced, and the overall attack performance improves.

BibTeX

@misc{Ye2024CMAT
  author={Xiaoyu Ye, Jingjing Yu, Jungang Li, Yiwen Zhao, Qiutong Liu},
  title={CMAT:A Cross-Model Adversarial Texture for Scanned Document Privacy Protection},
  year={2024},
  url={https://github.com/LJungang/CMAT}
}