AI researchers got chatbots to share cocaine recipes using this wild trick



short

  • Researchers got groundbreaking AI models to generate cocaine manufacturing instructions using a new rapid injection attack.
  • The same technology manipulated the AI ​​encoder to upload sensitive credentials.
  • The study says the immediate injection stems from “role confusion,” not just models that fail to recognize malicious stimuli.

Forget clever claims: AI researchers say they tricked leading AI models into generating instructions for making cocaine by convincing them the dangerous thoughts were their own, while also manipulating the process of making cocaine. Artificial intelligence coding agent In leaking sensitive credentials.

in the paper”Immediate injection as role confusion”, which was presented at the International Conference on Machine Learning in June, researchers Charles Ye, Jasmine Cui, and Dylan Hadfield Mennell say that both Immediate injection Demonstrations of the attack stem from a structural flaw in how large language models (LLMs) distinguish trusted instructions from untrusted text.

“For the MBA, everything arrives through the same channel like a long symbolic soup,” the team wrote. “There are his own ideas next to your instructions, which are next to the contents of a random web page he just fetched.”

The paper also pointed to what the researcher called “role confusion,” where models rely on writing style rather than role tags to determine whether commands are trustworthy. Instead of recognizing attacker-controlled content as external input, the researchers found that models could mistake it for legitimate user commands — or even their own internal logic.

“Think of it from the LLM’s point of view. When he sees the text of his previous reasoning, he implicitly trusts his conclusions. This is the whole point of reasoning: if the LLM had to re-draw the same conclusions, the reasoning would be useless,” they wrote. “So I think the text has a kind of overall trust. Combined with our previous findings, this suggests that if you can make inserted text look like model logic, you can steal that trust.”

The attack is called Chain of Thought (CoT) forgery, which introduces fake logic that mimics the model’s internal thought process. Models that normally reject illegal requests instead generate instructions for synthesizing cocaine after accepting the fabricated logic as their own.

The researchers said technique Jailbreak success rates increased from almost zero to around 60% across models they tested, including OpenAI’s GPT-5 nano, mini, full, o4-mini, gpt-oss-20b, and gpt-oss-120b. They also said it works on the GLM-4.6, Kimi-K2-Instruct, and MiniMax-M2.

In the experiment, the researchers said they were also able to trick the AI ​​coding agent into loading the SECRETS.env file after hiding the malicious instructions in the web page.

“Using our sensors, we found that simply adding the word ‘user’ in front of the command causes the model to perceive the command as more likely to be real user text (i.e., an increase in user level),” they wrote. “In other words, the attacker can just claim the role played by the text, and LLM believes him.”

The study comes as spot injection attacks continue to expose vulnerabilities in artificial intelligence agents. In April, researchers from Google to caution The malicious web pages were hiding invisible instructions designed to trick AI agents into leaking credentials, deleting files, and even sending PayPal payments.

In June, Microsoft disclosed an instantaneous injection vulnerability in the Anthropic system Claude Code A GitHub action that could have exposed credentials stored in software development pipelines. Days later, another reference study Found AI agents powered by GPT-5 and Gemini still fail the majority of flash injection attacks, despite improvements in model capabilities.

Daily debriefing Newsletter

Start each day with the latest news, plus original features, podcasts, videos and more.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *