The Promptware Kill Chain · Securing Money-Moving Agents

A money-moving agent does not get compromised the way a server does. Nobody steals a credential or pops a shell. The agent keeps using exactly the tools it was issued, with exactly the permissions it was granted, and every step shows up in logs as normal work. The difference is that the instructions driving those tools came from an attacker, not from you.

That is the uncomfortable property of indirect prompt injection. It turns the model's willingness to follow instructions into an execution primitive. Researchers have started calling the multi-step version of this "promptware": a single injected string that evolves into something with the shape of malware, moving through recognizable stages toward an objective. For an agent that can initiate payments, the objective is a transfer.

This module walks that chain end to end. Authentication and intent (module 2), credential scoping (module 3), and gateways (module 6) are the controls that break it; here we stay focused on the attack itself, so you can see exactly where those controls have to land.

Why this is different from classic injection

Old prompt injection was a chat problem. Someone tricked a model into ignoring its system prompt and saying something it should not. Annoying, low blast radius.

The money-moving agent changes the stakes because the model now sits in front of tools: a payments API, a database, a file store, a message queue. An instruction the model accepts as guidance can become a tool call that does real work. Standard access controls do not help, because the agent is authorized to make that call. The injected content is not breaking a permission boundary. It is using one you already opened.

This is why we treat the agent's tool surface as an execution environment, not a feature list. Every tool the agent can reach is something an attacker can reach through it.

The stages

The promptware chain maps onto a familiar intrusion sequence. We use five stages because they line up with where defenses go.

1. Initial access

The injected instruction enters through any content the agent reads and treats as trustworthy. An email it summarizes. A support ticket. A PDF invoice. A web page it fetches. A row in a database written by a previous, poisoned run.

The defining trait is that the agent cannot tell data from instructions inside its own context window. Untrusted text and trusted text share one channel, and the model weighs them together.

2. Tool misuse

Once the instruction is accepted, it directs the agent to call tools toward the attacker's goal. No credentials are stolen. The agent uses its own authorized tools, pointed at the wrong target. In logs this is the hardest stage to spot, because a misused tool call is structurally identical to a legitimate one.

3. Exfiltration

The attacker needs data out: account numbers, balances, an approval token, the contents of a file the agent can read. The cleanest exfiltration paths do not require a custom tool. The most documented one is markdown image rendering. The agent is instructed to emit an image link whose URL embeds the stolen data, and the moment a browser or client renders that markdown, it makes a request to the attacker's server and the data leaks with no further interaction.

This exact pattern has been confirmed across many production systems, which is why image rendering and outbound URLs from agent output deserve the same scrutiny as a network egress rule.

4. Lateral movement

The agent reaches into connected systems and, increasingly, other agents. A poisoned record it writes becomes the initial-access payload for the next agent that reads it. This is how injection stops being a single-session event and becomes persistent across a fleet. Multi-step incident analysis through 2025 and 2026 shows lateral movement and persistence appearing in a growing share of documented promptware attacks, where they were essentially absent in 2023.

5. Action on objective: the transfer

The final tool call moves money. For a money-moving agent this is the whole point of the chain, and by the time the agent reaches it, every prior stage looked like ordinary operation.

A worked example: EchoLeak

This is not theoretical. EchoLeak (CVE-2025-32711, CVSS 9.3), disclosed by Aim Labs in June 2025, is the first real-world zero-click prompt injection exploit confirmed in a production LLM system, Microsoft 365 Copilot.

Walk the stages. Initial access: an attacker sends one crafted email to a Copilot user. No click, no reply. When Copilot later pulls context to answer an unrelated question, it ingests the email's hidden instructions. Tool misuse: the injected text directs Copilot to retrieve internal content within its scope, including OneDrive files, SharePoint, Teams messages, and chat history. Exfiltration: the payload chained several bypasses, evading Microsoft's cross-prompt-injection classifier, using reference-style markdown to dodge link redaction, and abusing auto-fetched images and a Teams proxy permitted by the content security policy, so the harvested data left for an attacker-controlled server without user interaction.

EchoLeak did not move money. But replace "read internal files" with "submit a payment" and the chain is identical. Microsoft patched it server-side and reported no exploitation in the wild, which is the right outcome and also the reason most teams have never seen one of these in their own logs. Absence of incidents is not absence of exposure.

A second example shows persistence and lateral movement directly. In CVE-2025-53773, prompt injection planted in source-code comments or project files instructs GitHub Copilot to write "chat.tools.autoApprove": true into a workspace .vscode/settings.json. That one config write disables the human approval step for every subsequent tool call, converting a single injection into standing, unattended execution. The same move against a money-moving agent would be an instruction that quietly relaxes an approval threshold, so the transfer two steps later never prompts anyone.

What the chain tells you about defense

The structure carries a clear lesson: there is no single choke point, because the agent is doing authorized work at every stage. You cannot inspect your way to safety with one filter at the input.

So the controls in this course attach to specific stages. Authentication and intent separation (module 2) attack stage 1 by refusing to treat ingested content as instruction. Scoped, short-lived credentials (module 3) shrink what stages 2 through 4 can reach. The tool gateway and least agency (module 6) sit on stage 5, where the irreversible call happens. Human-in-the-loop checkpoints (module 7) force a real person onto the transfer, and tamper-evident audit (module 8) makes the misused-tool-call problem visible after the fact even when it looked normal in the moment.

Treat the agent's full tool surface as attacker-reachable, and assume any content the agent reads can carry instructions. Both assumptions are now backed by confirmed production exploits.

The takeaway is not that agents are unsafe to build. It is that a money-moving agent inherits an attack chain where every stage uses your own permissions against you. You break it by placing controls at the stages, not by trusting the model to tell good instructions from bad ones. It cannot, and it was never designed to.

← Previous

Signed Mandates: Cryptographic Permission to Spend with AP2

Tool Gateways and Least Agency