Title:
Non-Halting Vulnerability in AI Agents
Abstract:
We introduce a new vulnerability that exploits fixed points in autoregressive models and use it to craft queries that never halt. More precisely, for non-halting queries, the LLM never samples the end-of-string token <eos>. We rigorously analyze the conditions under which the non-halting anomaly presents itself. In particular, at temperature zero, we prove that if a repeating (cyclic) token sequence is observed at the output beyond the context size, then the LLM does not halt.
In this talk we will share some of our first results: We demonstrate non-halting queries in many experiments performed in base unaligned models where repeating prompts immediately lead to a non-halting cyclic behavior as predicted by the analysis. We devise an experiment and show that the recipe to create non-halting queries succeeds with high success rates ranging from 97% for GPT-4o to 19% for Gemini Pro 1.5. We also study gradient-based direct inversion to craft new short prompts to induce the non-halting state. We inverted 10,000 random repeating 2-cycle outputs for llama-3.1-8b-instruct. Out of 10,000 three-token inverted prompts 1,512 yield non-halting queries reaching a rate of 15%. Our experiments with ARCA show that non-halting may be easily induced with as few as 3 input tokens with high probability. Overall, our experiments demonstrate that non-halting queries are prevalent in LLMs and relatively easy to find.
Finally, we will share some views on how the non-halting vulnerability may present itself in next generation AI deployments and its impact to the growing agent ecosystem.
Bio:
Berk Sunar is the Professor of Electrical and Computer Engineering at the Worcester Polytechnic Institute where is also as the founder of the Vernam Applied Cryptography and Cybersecurity Laboratory. His recent research interests include AI for Security, Security for AI, Microarchitectural Security,and Lattice Based cryptography.
Speaker:
Prof. Berk Sunar
Vernam Applied Cryptography and Cybersecurity Laboratory
Electrical and Computer Engineering
Worcester Polytechnic Institute
USA