The Inventory Problem: Finding Every LLM Already Running in Your Bank · Model Risk Management for LLMs

Model risk management has one precondition that sits before validation, before tiering, before any control. You have to know what exists. A framework cannot be applied to a model that no team has declared, and most banks are carrying far more language-model exposure than their inventory shows. The gap is not a documentation lag. It is a structural blind spot, and it is the first thing that breaks when an examiner asks for a complete population.

The previous module covered how the guidance changed in 2026. This one assumes the harder problem is already in front of you: the population you are supposed to govern is partly invisible, and the usual model inventory was never built to catch it.

Why the inventory you have is wrong

Traditional model inventories were built around models the bank constructs and owns. A credit scorecard has a developer, a build date, a validation file, and a line in a spreadsheet. LLMs do not arrive that way. They arrive as features inside software you already license, as API calls from application code, and as browser tabs your staff open without asking anyone.

Three properties make them slip the net.

First, most LLM usage is consumption, not construction. No one built the model, so no one feels they own a model to declare. Second, the access is diffuse. A single procurement contract for a CRM can introduce a summarization model, a drafting model, and an agent, none of which appear on the contract. Third, the cost is small and per-call, so it never trips the capital-expenditure reviews that historically surfaced new systems.

The result is consistent across institutions. Surveys through 2025 put unsanctioned AI use among employees somewhere between roughly half and four-fifths depending on the sample, with one BlackFog survey finding 49 percent of workers had adopted AI tools without employer approval. Your inventory almost certainly reflects the AI you approved, not the AI you run.

Where the LLMs actually live

Treat discovery as four distinct surfaces, because each one needs a different tool and a different owner.

Embedded in third-party software

This is the largest and quietest category. The CRM added an email drafter. The contact-center platform added call summarization. The ticketing tool added a suggested-reply feature. None of these were bought as models, and the vendor often will not say which provider sits underneath. Your starting point is the software asset register and the procurement contracts, read specifically for AI feature flags and data-processing addenda.

Direct API integrations

Application teams calling api.openai.com, api.anthropic.com, Azure OpenAI, or Bedrock from their own code. These are real production models making real decisions, and they almost never appear in a model inventory because the team thinks of them as a software dependency.

Hyperscaler and platform services

Models invoked through a cloud account: Bedrock, Vertex, Azure AI. These leave a billing and IAM trail, which makes them more findable than embedded features if you know to look.

End-user shadow usage

Staff pasting into a consumer chatbot through the browser. Lower stakes per interaction, higher data-leakage risk, and the hardest to enumerate as discrete models. Here you are inventorying a behavior and a data flow more than a system.

A discovery playbook that actually finds them

No single scan covers all four surfaces. Run these in parallel and reconcile the outputs.

Network egress and CASB. Filter outbound traffic and CASB alerts for known endpoints at minimum: api.openai.com, api.anthropic.com, claude.ai, huggingface.co, and your Azure OpenAI hostnames. This catches browser usage and direct API calls fast. Note the limit honestly: CASB sees browser sessions and egress, but a backend service calling a provider through a managed gateway may not surface the same way, and it tells you nothing about models running inside your own VPCs.

Cloud billing and IAM. Pull line items for Bedrock, Vertex, and Azure AI from cloud cost reports, then map back to the owning account and team. Billing is the most reliable signal you have for platform-hosted models because someone is paying per token.

Code and dependency scans. Search repositories for AI SDK dependencies in requirements.txt, package.json, and go.mod, and scan secrets managers and .env files for provider API keys. A live key is a live integration.

Procurement and vendor attestation. Cross-reference every SaaS contract against a short questionnaire: does this product use or embed an LLM, which provider, and does it train on our data. This is the only way to reach embedded features that leave no trace on your network.

A worked example

A regional bank ran this exercise and held a model inventory of 140 models, none of them LLMs. The four scans returned the following.

Cloud billing showed a steady Bedrock charge tracing to a fraud-operations team using a model to summarize case notes. Egress filtering surfaced two engineering teams calling api.openai.com directly from internal tooling. A repository scan found an OpenAI key in a marketing automation service that auto-drafted customer emails. The procurement questionnaire returned that the licensed contact-center platform had quietly enabled call-summary generation across every agent.

That is four production LLM use cases, touching fraud, customer communications, and customer service, none of which appeared in the model inventory the week before. Two of them were generating customer-facing text. The discovery work did not validate anything. It simply turned an unknown population of four into a known one, which is the only state from which tiering and validation can begin.

One nuance worth flagging

There is a live debate about whether generative and agentic models fall inside the strict letter of the latest supervisory guidance, and a bank's counsel will have a view on that. The inventory imperative does not depend on the answer. Whether or not a given LLM is a "model" under one definition, it is making or shaping decisions in production, and an examiner, a regulator, or a plaintiff will ask what it does and who signed off. You cannot answer for a system you have not found. Build the complete population first, then let the scoping arguments play out against a list that is actually complete.

Takeaway

The inventory problem is not a paperwork chore. It is the load-bearing first step, because every later control assumes a known population, and the population is partly hidden by design. Run all four discovery surfaces, reconcile them into one register, and treat any LLM you find as in scope until someone with authority and a paper trail decides otherwise. A model you have not found is a model you cannot tier, validate, or defend.

← Previous

From SR 11-7 to SR 26-2: What Changed and What GenAI Is In Scope For

Risk-Tiering by Materiality: Sizing Oversight to Impact