Extracting Training Data from Large Language Models



We introduce the first practical training data extraction attack on a production neural language model. With query access to GPT-2 (trained on 40GB of text) we can extract hundreds of individual examples that were used to train the model. These extracted examples include personally identifiable information, IRC conversations, copyright code, and 128-bit UUIDs. Most worryingly, we find extraction attacks become much easier as models become larger.

The talk is based on a paper that will appear in USENIX Security, 2021 and is authored by Nicholas Carlini (Google), Florian Tramer (ETHZ), Eric Wallace (Stanford), Matthew Jagielski (Northeastern University), Ariel Herbert-Voss (OpenAI, Harvard), Katherine Lee (Google), Adam Roberts (Google), Tom Brown (OpenAI), Dawn Song (UC Berkely), Ulfar Erlingsson (Apple), Alina Oprea (Northeastern University), Colin Raffel (Google).



Nicholas Carlini, PhD

Research Scientist,
Google Brain



Nicholas Carlini is a research scientist at Google Brain. He studies the security and privacy of machine learning, for which he has received best paper awards at ICML, USENIX Security and IEEE S&P. He obtained his PhD from the University of California, Berkeley in 2018.