Observed for all evaluated backbones in this deployment.
Example Trace
Harmless-looking content can carry executable instructions.
Evaluation Results
Recent backbones still fail under realistic indirect prompt injection.
Gemini 3.1 Pro has the highest case-weighted total ASR.
Claude Opus 4.6 has the lowest total ASR in this evaluation.
Evaluated under the same surface and malicious-goal structure.
Attack success rate by surface
Lower ASR indicates fewer successful malicious-goal completions.
| Surface | GPT-5.3-Codex | Claude Opus 4.6 | Gemini 3.1 Pro | Kimi K2.5 | GLM-5 |
|---|---|---|---|---|---|
| Group chat (n=15) | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% |
| Email (n=50) | 20.0% | 2.0% | 12.0% | 6.0% | 6.0% |
| Local Docs (n=50) | 34.0% | 0.0% | 30.0% | 12.0% | 50.0% |
| Repo Links (n=4) | 100.0% | 50.0% | 100.0% | 100.0% | 100.0% |
| Gist (n=50) | 0.0% | 0.0% | 20.0% | 0.0% | 0.0% |
| Total ASR (n=169) | 27.2% | 10.7% | 29.6% | 16.6% | 27.8% |
Evaluating agentic risk where prompt injection actually enters the workflow.
AI agents such as OpenClaw increasingly operate with access to external tools, local files, web content, messaging channels, email, repositories, and wallet interfaces. LivePI evaluates indirect prompt-injection risk in a real virtual-machine deployment with live but test-controlled channels, covering seven input surfaces, twelve attack or rendering families, and five malicious goals.
Production-Like Benchmark
LivePI contains 169 executable attack instances across group chat, email, local files, repository links, and public Gists.
Cross-Backbone Evaluation
The benchmark compares GPT-5.3-Codex, Claude Opus 4.6, Gemini 3.1 Pro, Kimi K2.5, and GLM-5 under the same surface and goal structure.
Executable Outcome Traces
Each case records prompts, tool calls, tool outputs, and final responses so attack success can be judged from concrete agent behavior.
Benchmark Design
Attack surfaces, malicious goals, and executable trials.
Untrusted Surface
Inject content through a realistic channel the agent naturally reads during a task.
Place Attack
Insert the malicious instructions into harmless-looking content.
Goal Completion
Score whether the agent both follows the injected instruction and completes the malicious effect.
Surfaces Seven live or tool-mediated input channels WhatsApp, Telegram, Slack, Email, Local Docs, Repo Links, and Gist
Construction 169 executable cases across 12 attack families Only feasible surface, technique, and goal combinations are instantiated
Each LivePI case is built as a concrete end-to-end episode rather than a static prompt. We first select a channel that a real OpenClaw workflow can touch, then pair it with an attack rendering that plausibly appears as external content, and finally bind it to a malicious goal with observable side effects.
The benchmark deliberately avoids the full Cartesian product of surfaces, techniques, and goals. Instead, it instantiates combinations that are executable in the deployed VM: email attacks arrive through a mailbox, repository attacks are fetched during coding workflows, local-document attacks are read from disk, and chat attacks are delivered through shared messaging channels.
Goals Five malicious objectives spanning confidentiality, integrity, and financial abuse Protected data, security controls, unsafe code, inbox summaries, and crypto transfer
- Protected-information exfiltration
- Unauthorized security-control changes
- Unsafe code retrieval or execution
- Inbox-summary exfiltration
- Unauthorized cryptocurrency transfer
Citation
If you use LivePI, please cite the paper.
@misc{zhao2026livepi,
title={LivePI: More Realistic Benchmarking of AI Agents Against Indirect Prompt Injection},
author={Zhao, Lei and Bhaskar, Abhay and Dobriban, Edgar},
year={2026},
note={Preprint}
}