LivePI: More Realistic Benchmarking of AI Agents Against Indirect Prompt Injection

Example Trace

Harmless-looking content can carry executable instructions.

Example LivePI attack trace showing a benign email-summary request followed by injected instructions in retrieved email content.

Evaluation Results

Recent backbones still fail under realistic indirect prompt injection.

100.0% Group Chat ASR

Observed for all evaluated backbones in this deployment.

29.6% Highest Total ASR

Gemini 3.1 Pro has the highest case-weighted total ASR.

10.7% Lowest Total ASR

Claude Opus 4.6 has the lowest total ASR in this evaluation.

5 Recent Backbones

Evaluated under the same surface and malicious-goal structure.

Attack success rate by surface

Lower ASR indicates fewer successful malicious-goal completions.

Surface	GPT-5.3-Codex	Claude Opus 4.6	Gemini 3.1 Pro	Kimi K2.5	GLM-5
Group chat (n=15)	100.0%	100.0%	100.0%	100.0%	100.0%
Email (n=50)	20.0%	2.0%	12.0%	6.0%	6.0%
Local Docs (n=50)	34.0%	0.0%	30.0%	12.0%	50.0%
Repo Links (n=4)	100.0%	50.0%	100.0%	100.0%	100.0%
Gist (n=50)	0.0%	0.0%	20.0%	0.0%	0.0%
Total ASR (n=169)	27.2%	10.7%	29.6%	16.6%	27.8%

Evaluating agentic risk where prompt injection actually enters the workflow.

AI agents such as OpenClaw increasingly operate with access to external tools, local files, web content, messaging channels, email, repositories, and wallet interfaces. LivePI evaluates indirect prompt-injection risk in a real virtual-machine deployment with live but test-controlled channels, covering seven input surfaces, twelve attack or rendering families, and five malicious goals.

Production-Like Benchmark

LivePI contains 169 executable attack instances across group chat, email, local files, repository links, and public Gists.

Cross-Backbone Evaluation

The benchmark compares GPT-5.3-Codex, Claude Opus 4.6, Gemini 3.1 Pro, Kimi K2.5, and GLM-5 under the same surface and goal structure.

Executable Outcome Traces

Each case records prompts, tool calls, tool outputs, and final responses so attack success can be judged from concrete agent behavior.

Benchmark Design

Attack surfaces, malicious goals, and executable trials.

Untrusted Surface

Inject content through a realistic channel the agent naturally reads during a task.

Place Attack

Insert the malicious instructions into harmless-looking content.

Goal Completion

Score whether the agent both follows the injected instruction and completes the malicious effect.

Surfaces Seven live or tool-mediated input channels WhatsApp, Telegram, Slack, Email, Local Docs, Repo Links, and Gist

Group ChatWhatsApp, Telegram, and Slack messages from unverified participants enter the agent's conversation context.

EmailGmail content is retrieved from a monitored mailbox and inserted as tool-returned evidence.

Local DocsInjected handoff files are read from disk during otherwise benign file-inspection workflows.

Repo LinksAttacker-controlled repositories or packages are fetched, installed, or imported during setup tasks.

GistPublic tutorial-style web content is retrieved as reference material and becomes prompt-visible context.

Construction 169 executable cases across 12 attack families Only feasible surface, technique, and goal combinations are instantiated

Each LivePI case is built as a concrete end-to-end episode rather than a static prompt. We first select a channel that a real OpenClaw workflow can touch, then pair it with an attack rendering that plausibly appears as external content, and finally bind it to a malicious goal with observable side effects.

The benchmark deliberately avoids the full Cartesian product of surfaces, techniques, and goals. Instead, it instantiates combinations that are executable in the deployed VM: email attacks arrive through a mailbox, repository attacks are fetched during coding workflows, local-document attacks are read from disk, and chat attacks are delivered through shared messaging channels.

Goals Five malicious objectives spanning confidentiality, integrity, and financial abuse Protected data, security controls, unsafe code, inbox summaries, and crypto transfer

Protected-information exfiltration
Unauthorized security-control changes
Unsafe code retrieval or execution
Inbox-summary exfiltration
Unauthorized cryptocurrency transfer

Citation

If you use LivePI, please cite the paper.

@misc{zhao2026livepi,
  title={LivePI: More Realistic Benchmarking of AI Agents Against Indirect Prompt Injection},
  author={Zhao, Lei and Bhaskar, Abhay and Dobriban, Edgar},
  year={2026},
  note={Preprint}
}

Lei Zhao, Abhay Bhaskar, and Edgar Dobriban

University of Pennsylvania

Correspondence to LZ (leizhao7@wharton.upenn.edu) and ED (dobriban@wharton.upenn.edu).

Safety note: This project studies indirect prompt-injection attacks in controlled accounts and assets. Operational payload details are redacted or bounded where appropriate.