Synthetic Data -- Anonymisation Groundhog Day



Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data publishing that addresses the shortcomings of traditional anonymisation techniques. The promise is that synthetic data drawn from generative models preserves the statistical properties of the original dataset but, at the same time, provides perfect protection against privacy attacks.

In this talk, I will present our recent work in which we quantitatively evaluate such claims and compare the privacy gain of synthetic data publishing to that of traditional anonymisation techniques. Our evaluation of a wide range of state-of-the-art generative models demonstrates that synthetic data either does not prevent inference attacks or does not retain data utility. In other words, we empirically show that synthetic data does not provide a better tradeoff between privacy and utility than traditional anonymisation techniques.



Theresa Stadler, M. Sc.

PhD Research Assistant
EPFL (Switzerland



Theresa Stadler is a PhD Research Assistant at the SPRING Lab at EPFL (Switzerland) led by Carmela Troncoso. Her research focuses primarily on the privacy aspects of data processing systems. She previously worked as a privacy researcher for Privitar, a London-based scale-up, where she developed enterprise software that implements privacy-enhancing technologies and aims to makes these technologies available to organisations at scale. She holds a Master's degree in Neural Information Processing (Biomathematics) from the University of Tübingen (Germany) where she furthermore conducted research in Applied Machine Learning on biomedical data.