Teams building AI for credit, pricing, and risk tend to assume the audit risk lives in the model. They prepare to defend the architecture, the validation report, the performance metrics. Then the examiner opens with questions about a column in a table.
This is not a misunderstanding of how the system works. It is the correct order of operations. A model is a function of its inputs, and an examiner who cannot trust the inputs has no basis to evaluate anything downstream. The data is the first target because everything else depends on it being right.
The gate that comes before model analysis
The clearest statement of this sequence is in the Interagency Fair Lending Examination Procedures, the shared playbook used by the OCC, Federal Reserve, FDIC, and NCUA. When examiners assess a lender, they first validate the accuracy of the institution's submitted data, typically by pulling a sample of loan records and checking each one against the underlying loan file.
The procedures are explicit about what happens when those records do not match. If the reported data are inconsistent with the loan files, examiners may not be able to proceed with a fair-lending analysis until the data have been corrected. The institution gets directed to fix integrity through scrubbing, monitoring, and training before the substantive review continues.
Read that as a hard gate. Comparative file review, the core of a fair-lending exam, compares how similarly qualified applicants from a protected-class group were treated against a control group. That comparison is only meaningful if the data describing both groups is accurate. Bad data does not produce a wrong fairness finding. It produces no finding at all, plus a remediation order and a delayed exam.
Why this generalizes beyond fair lending
The same logic runs through model-risk supervision. SR 11-7, the 2011 interagency guidance on model risk management from the Federal Reserve and OCC, treats data quality as a first-order control, not a footnote to model validation. Independent validation is supposed to assess the inputs, the processing, and the outputs, and weak input data is named as a primary source of model risk.
So whether the lens is fair lending or safety and soundness, the supervisory instinct is the same. Prove the inputs before you argue about the function that consumes them.
What "the data" means in an AI system
For a traditional scorecard, the data under review is a short list of structured fields. For an AI system, the surface is much larger, and that is exactly why examiners now lead with it.
The CFPB made this concrete in Circular 2022-03 and again in Circular 2023-03 on adverse-action notices. The Bureau warned that complex algorithms sometimes rely on data harvested from consumer surveillance or data not typically found in a credit file, some of which may not intuitively relate to the likelihood of repayment. The Bureau flagged those inputs as a heightened consumer-protection risk, and it held that creditors cannot escape their obligation to give specific, principal reasons for a denial just because the model is complex.
The practical meaning for you is that your audit-first data surface includes every input that reaches the model: application fields, bureau attributes, derived features, third-party and alternative data, and anything pulled at inference time. Each of those is a place an examiner can ask the question that stalls everything else: where did this value come from, and can you prove it is correct?
A worked example
A lender ships an AI underwriting model. The validation deck is strong: stable performance, documented testing, a clean fairness analysis run on the modeling dataset. The team expects the review to center on the model.
The examiner instead samples 40 booked and declined applications and traces a handful of input features back to source. One feature, "months at current employer," is built by an upstream ETL job. For roughly 12 percent of the sample, the value in the feature store does not match the value in the application of record. The pipeline had been silently coercing blank employment fields to zero.
Now the problem is not a fairness gap. It is that the lender cannot demonstrate the model was scored on accurate inputs. The fairness analysis was run on the same flawed feature, so it proves nothing about applicants who actually faced the model. The examiner cannot proceed, issues a data-integrity finding, and the substantive review pauses pending remediation. The strong model was never the question.
That single mismatched column did more damage than any modeling choice, because it broke the foundation the rest of the audit stands on.
What this asks of you before an exam
The takeaway is to build for the first question, not the last one. Three things make the data layer examinable.
First, traceability. For any input feature, you should be able to trace a specific value, for a specific decision, back to its source of record in under a minute. That is the test the comparative file review applies, and it is the test you should apply to yourself.
Second, reconciliation. The accuracy check examiners run is one you can run continuously. Sample scored decisions, compare the feature values the model actually consumed against the source of record, and track the mismatch rate as a monitored metric rather than discovering it during a sample pull.
Third, scope honesty. Inventory every input that reaches the model, including third-party and alternative data, and be ready to say what each one is and why it is there. The CFPB's interest in non-intuitive inputs means an undocumented feature is a finding waiting to happen.
The model is where the value is. The data is where the audit starts. Teams that internalize that order spend their exam defending a system, and teams that do not spend it reconciling a spreadsheet. The rest of this course works through the specific data controls, from lineage to consent to point-in-time correctness, that turn "trust us" into "here is the record."