One of Google’s security research initiatives, Project Zero, has successfully managed to detect a zero-day memory safety vulnerability using LLM assisted detection. “We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software,” the team wrote in a post.
Project Zero is a security research team at Google that studies zero-day vulnerabilities, and back in June they announced Project Naptime, a framework for LLM assisted vulnerability research. In recent months, Project Zero teamed up with Google DeepMind and turned Project Naptime into Big Sleep, which is what discovered the vulnerability.
The vulnerability discovered by Big Sleep was a stack buffer overflow in SQLite. The Project Zero team reported the vulnerability to the developers in October, who were able to fix it on the same day. Additionally, the vulnerability was discovered before it appeared in an official release.
“We think that this work has tremendous defensive potential,” the Project Zero team wrote. “Finding vulnerabilities in software before it’s even released, means that there’s no scope for attackers to compete: the vulnerabilities are fixed before attackers even have a chance to use them.”
According to Project Zero, SQLite’s existing testing infrastructure, including OSS-Fuzz and the project’s own infrastructure, did not find the vulnerability.
This feat follows security research team Team Atlanta earlier this year also discovering a vulnerability in SQLite using LLM assisted detection. Project Zero used this as inspiration in its own research.
According to Project Zero, the fact that Big Sleep was able to find a vulnerability in a well fuzzed open source project is exciting, but they also believe the results are still experimental and that a target-specific fuzzer would also be as effective at finding vulnerabilities.
“We hope that in the future this effort will lead to a significant advantage to defenders – with the potential not only to find crashing testcases, but also to provide high-quality root-cause analysis, triaging and fixing issues could be much cheaper and more effective in the future. We aim to continue sharing our research in this space, keeping the gap between the public state-of-the-art and private state-of-the-art as small as possible,” the team concluded.