Documentation and the Accountability Chain · Model Risk Management for LLMs

A model file is the only thing an examiner can actually inspect. They cannot watch you reason, sit in your validation meetings, or read the Slack thread where someone decided a 4 percent regression was acceptable. They read the file. If the file does not say it, it did not happen.

That standard has not changed since SR 11-7, and it does not soften under the 2026 framework. What changes for LLMs is that the file now has to defend decisions about a model you did not build, cannot inspect, and do not control. The accountability chain is how you make those decisions defensible: every material judgment traces to a named owner who signed off with the evidence in front of them.

The "unfamiliar reader" standard

SR 11-7 set the bar that still governs the file. Documentation of model development and validation must be complete enough that "parties unfamiliar with a model can understand how the model operates, its limitations, and its key assumptions." Write the file for someone who joins the team in 18 months, after the people who built the system have left.

This standard does real work for LLMs. You cannot document the model's internals, so the file has to compensate by being exhaustive about everything you can observe: the provider, the exact model version and snapshot date, the prompt and system instructions in force, the retrieval sources, the guardrails, and the eval results that justified deployment.

The failure mode is a file that documents intent instead of decisions. "The model was tested for bias" is intent. "We ran the 480-case fairness eval on 2026-04-12 against claude-haiku-4-5-20251001, observed a 2.1 point approval-rate gap by protected class, judged it within the 3 point tolerance set by the second line, owner J. Okafor" is a decision. Examiners want the second kind.

What the file has to contain

A defensible LLM model file covers the same elements SR 11-7 has always required, adapted to a system you access rather than own.

Purpose, scope, and tier

State what the model decides, what it is allowed to touch, and the materiality tier that sets how much scrutiny it gets. The tier (covered in module 3) is the load-bearing field, because it justifies every later choice about validation depth and monitoring frequency. A tier-1 underwriting assistant and a tier-3 internal drafting tool should not carry the same file, and the tier is the reason.

The exact thing in production

Record the provider, model name, version snapshot, and the date that version went live. For a hosted LLM this is the closest you get to "what the model is," and version drift is the most common way a documented model and the production model quietly diverge. Bind the documentation to the production version so a model card and the running system cannot disagree silently.

Assumptions, limitations, and known failure modes

This is where LLM files earn their keep. Document the assumptions you made because you could not see inside: that the provider's training data is broadly current to a stated cutoff, that the context window holds the full policy you injected, that the model has no memory across sessions unless you built it. List the known failure modes (hallucinated citations, prompt-injection exposure, refusal on edge cases) and what you do about each.

Evidence and approval history

Every claim in the file points to an artifact: the eval run, the validation report, the monitoring dashboard, the change ticket. And every material decision carries a name and a date. The approval history is the spine of the accountability chain, because it shows who accepted which risk, when, and on what evidence.

The accountability chain, by line

The three-lines structure tells the examiner who owns what, and it is the first thing they reconstruct from the file.

First line (developers and users) owns the model: its build, its documentation, and its day-to-day performance. They write the development record and run the system.
Second line (model risk management) owns independent validation and effective challenge. They set the tolerances, sign the validation report, and are accountable for saying no.
Third line (internal audit) owns assurance over the whole framework. They test whether the first two lines did what the file claims.

The chain breaks when the same person plays two roles. If the developer who built the retrieval pipeline also signed the validation that approved it, the file shows a control that does not exist. Examiners look for exactly this. Name different people, and make the conflict-of-interest separation visible in the approval history.

Senior management and the board sit above the three lines and stay accountable for model-driven decisions even when the model is outsourced and highly automated. "The vendor's model did it" has never been a defense, and it is a weaker one for an LLM where the vendor will not show you the model.

A worked example

A bank deploys an LLM-based AML triage assistant that drafts narratives for suspicious-activity review. Tier 1, because a missed alert is a regulatory and legal exposure.

The file shows: purpose and tier on page one; the production model as claude-haiku-4-5 at a named snapshot, live 2026-05-03; the system prompt and the typology rules injected at runtime, version-controlled; assumptions, including that the assistant drafts but a human analyst always disposes; the limitation that the model can fabricate a transaction detail, with the mitigation that every drafted narrative is checked against source records before filing.

Evidence: a 600-case eval on labeled historical alerts, run 2026-04-28, with precision and recall against analyst ground truth, owner R. Mehta (first line). Validation: an independent report from the second line accepting the model for tier-1 use with a condition that false-negative rate is monitored weekly, owner S. Lindqvist, dated 2026-05-01. Change log: the snapshot upgrade on 2026-05-03 has a ticket, a re-run of the eval, and a second-line re-approval before it went live.

An examiner reading this file can answer the only questions that matter: what is in production, who decided it was safe, on what evidence, and who is on the hook if it fails. None of those answers require opening the model.

Takeaway

The file is the deliverable. Build it so that a stranger can understand the model's limits, every material decision carries a named owner and a date, and every claim points to an artifact you can produce on request. For an LLM you cannot open, the documentation and the accountability chain are the substitute for inspection, and a well-kept file is the difference between a finding and a clean exam.

← Previous

Continuous Monitoring: Drift, Hallucination Rate, and Stability Thresholds

The Fast Path: Governance That Ships Instead of Blocks