Generating Synthetic Data for Deep Neural Network Classification Using Global Differential Privacy Based Optimization

Authors

  • Shalini Agarwal Amity School of Engg. and Technology Amity University, Uttar Pradesh Lucknow, India

Keywords:

Classifier, Differential Privacy, Synthetic Dataset, Privacy Budget, Prompt Variance Loss

Abstract

Anonymizing individual text samples before dissemination, is an open research problem in Natural Language Processing (NLP). Significant efforts have been devoted to constructing such mechanisms by employing Local Differential Privacy (LDP) in the model training phase. However, LDP requires substantial noise in the update rule and often comes at the expense of the output language's quality. In this study, we address this limitation by introducing Global Differential Privacy (GDP). Specifically, we first train a generative language model in a differentially private manner and subsequently sample data from it. To do so, a novel idea of Prompt Variance Loss (PVL) is introduced, that enables the model to generate correct samples for a given instruction, thereby giving remarkable results. Experiments demonstrate that the synthetic datasets maintain privacy without leaking sensitive information from the original data as well as exhibit high suitability for training models and doing further analysis on real-world data. Notably, we show that training classifiers on private synthetic data outperforms directly training classifiers on real data with DP-SGD.

Downloads

Published

2024-09-22

How to Cite

Shalini Agarwal. (2024). Generating Synthetic Data for Deep Neural Network Classification Using Global Differential Privacy Based Optimization. Journal of Computational Analysis and Applications (JoCAAA), 33(05), 898–905. Retrieved from http://eudoxuspress.com/index.php/pub/article/view/659

Issue

Section

Articles

Similar Articles

<< < 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.