What do software supply chain security and generative AI have to do with each other? Until recently, the answer was “not much.”
But that’s changing due to a new type of software supply chain risk known as package hallucination. Package hallucination creates novel opportunities for threat actors to plant malicious code within software supply chains and prey on developers who use generative AI to write code.
Here’s a look at how this type of attack works, why it adds a new layer of difficulty to software supply chain security, and what enterprises can do to stay ahead of this challenge.
What is Package Hallucination
Package hallucination happens when a large language model (LLM) references a software library, module, or other type of package that does not actually exist.
For instance, imagine you’re using an AI tool like GitHub Copilot to help develop a Python, and it spits out a line of code like the following:
import advancedmathlib
No Python module or package named advancedmathlib exists. If Copilot generated code like this, it would be hallucinating.
How Package Hallucination Affects Supply Chain Security
In some cases, AI-generated code that references packages that don’t exist would simply result in the code not compiling or running properly, because the application would fail when it tries to retrieve the nonexistent package.
But it’s possible that something more insidious could happen. If threat actors were to create a package with the same name as the one hallucinated by an AI model, and if they injected malicious code into that package, the application would likely download and run the malicious code.
Note, too, that the package does not need to be malicious at the outset. It could initially be legitimate but beacon to a command and control server that updates the package with malicious code at a later date – so simply scanning the package for malicious contents won’t always reveal the risk.
In this way, AI package hallucination creates novel opportunities for attackers to poison software supply chains.
To date, no real-world software supply chain security attack has been known to occur. But researchers at Lasso Security showed how easily this type of attack could happen. They found that AI models hallucinated software package names at surprisingly high rates of frequency and repetitiveness – with Gemini, the AI service from Google, referencing at least one hallucinated package in response to nearly two-thirds of all prompts issued by the researchers.
Even more striking, the researchers also uploaded a “dummy” package with one of the hallucinated names to a public repository and found that it was downloaded more than 30,000 times in a matter of weeks. This is proof positive that large numbers of developers are blindly trusting AI-generated code that references hallucinated packages, and that it would be quite easy for threat actors to exploit this risk.
A New Twist on an Old Story: Package Hallucination vs. Typosquatting
If software supply chain exploits involving AI hallucination seem familiar, it’s probably because they resemble other types of supply chain attacks – especially package typosquatting, a technique threat actors have long used to trick software developers into incorporating malicious code into applications.
Package typosquatting involves uploading malicious packages with names that are similar, but not identical, to popular software packages. For instance, an attacker typosquatting on PyTorch (a legitimate, widely used Python library) might name a package PyTorchh or Py_Torch. Through carelessness when coding or browsing software repositories, developers might accidentally import the malicious package into their applications.
However, compared to package typosquatting, AI package hallucination has the potential to be more insidious and harmful, for several reasons:
● When used as a software supply chain attack method, package hallucination is likely to have a much higher success rate than typosquatting because it doesn’t rely on errant keystrokes to trigger a successful attack. Instead, it exploits the tendency of programmers to run AI generated code without assessing or validating it first.
● Developers are more likely to fall for the package hallucination attacks because they may assume that code generated by popular AI-assisted development tools can be trusted.
● For attackers, identifying commonly hallucinated package names doesn’t require highly specialized skills or tremendous amounts of time and effort. They can simply generate code using AI services, then scan it for repeated instances of hallucinated package names, using the same method as the Lasso Security researchers.
In short, package hallucination will likely prove easier for threat actors to exploit, and lead to a higher rate of malicious package downloads, than traditional approaches to injecting malicious code into software supply chains.
Protecting Your Software Supply Chain From Package Hallucination
The good news is that protecting software supply chains from this new type of risk boils down to leveraging the defenses that enterprises should already have in place, such as:
● Generating a Software Bill of Materials (SBOM) for applications they develop. SBOMs identify the software components within applications, making it easier to determine whether they include any hallucinated packages that may contain malicious code. (Unfortunately, IDC research shows that only 28 percent of enterprises automatically generate SBOMs.)
● Using Software Composition Analysis (SCA) tools to scan codebases for vulnerable components, including unrecognized packages that may have been hallucinated.
● Establishing guidelines and policies for AI-assisted software development, such as rules requiring developers to validate third-party software components before integrating them into a codebase.
Software supply chain security was already a serious challenge, with attacks surging in recent years. The package hallucination risk suggests that the problem is likely to grow even worse, making it all the more important for enterprises to invest in effective software supply chain defense and visibility solutions.