The financial services industry is going through a transformation that most people outside it do not fully appreciate. Machine learning models now make or heavily influence decisions about who gets a mortgage, who pays what interest rate, who gets approved for a car loan, and who gets flagged as a fraud risk. These are not edge cases or pilot programs. They are the operating reality of how American financial institutions make decisions about hundreds of millions of people today.
And most of these models are black boxes.
Not in the sense that they are intentionally opaque. In the sense that even the teams that built them often cannot fully explain why a specific individual was denied or approved. The model learned patterns from historical data that map inputs to outputs through mathematical relationships too complex for human intuition to follow. It works in aggregate. It falls apart as explanation for any individual case.
That is a problem. It is a growing legal problem, a regulatory problem, and most importantly, a fairness problem.
The adverse action notice gap
When a bank denies your mortgage application, you have a legal right to know why. The Equal Credit Opportunity Act and the Fair Credit Reporting Act both require lenders to provide adverse action notices: specific, substantive reasons for the denial. Not “your application did not meet our criteria.” Specific reasons.
This requirement was written for a world of human underwriters and rule-based scorecards. It is being applied to a world of gradient-boosted ensembles with 500 features, deep learning models trained on alternative data, and stacked model architectures where the output of one model becomes the input to another.
Sources: McKinsey Global Banking Annual Review; OCC Model Risk Survey 2024.
The gap between what the law requires and what current model architectures can easily provide is not a small one. Regulators at the CFPB, OCC, and Federal Reserve are actively grappling with it. Banks are navigating it with varying degrees of rigor. And the individuals being denied credit are left with adverse action notices that often technically comply with the letter of the law while failing entirely to provide the substantive explanation they are entitled to.
What explainability actually means in practice
Explainability in AI is not one thing. It is a spectrum of techniques with different uses, different costs, and different audiences. Understanding this matters because “we need explainable AI” is often invoked without clarity on what level of explainability is actually needed for what purpose.
For regulatory compliance, local interpretability and counterfactual explanations are what matter most. Consumers and examiners do not need to understand the full model architecture. They need to understand specific decisions and what, concretely, could change them.
This is where SHAP values become practically important. SHAP allows you to decompose a model’s prediction into the contribution of each input feature for a specific observation. You can tell a specific applicant that their application was denied primarily because their debt-to-income ratio exceeded a threshold, that a shorter employment tenure also contributed, and that improving those two factors would likely change the outcome. That is what a substantive adverse action notice looks like.
The accuracy-explainability tradeoff is mostly a myth
One of the most persistent misunderstandings in this space is the idea that you have to choose between accurate models and explainable models. That if you want a model regulators can understand, you have to sacrifice performance. This was somewhat true ten years ago. It is largely not true now.
AUC differences between architectures are often smaller than the uncertainty in training data. Compliance and regulatory defense costs for black-box models are significantly larger.
A well-tuned logistic regression or scorecard model, with careful feature engineering, often comes within two or three AUC points of a complex ensemble on credit scoring tasks. Meanwhile, the cost of compliance, explainability, and regulatory defense for the complex model can be substantially higher. Sometimes the most accurate model accounting for total cost is the interpretable one.
Where complexity is genuinely needed, XGBoost with SHAP-based explainability has become a practical standard for financial applications precisely because it delivers near-state-of-the-art performance while remaining explainable at the individual decision level.
What I am building toward
At IBM, my work embeds explainability into the model development lifecycle from the beginning. That means feature engineering with explicit regulatory justification for every input variable. It means model selection that accounts for interpretability requirements alongside accuracy benchmarks. It means automated adverse action reason generation that produces compliant, substantive explanations at scale.
The regulatory direction here is clear. The EU AI Act classifies credit scoring as high-risk AI. The CFPB has issued guidance on algorithmic adverse action notices. The OCC’s model risk guidance is increasingly explicit about explainability requirements. The United States will follow Europe’s lead on this. The question is timing, not direction.
Financial institutions that invest in explainable AI now are not just managing regulatory risk. They are building systems that are more trustworthy, more auditable, and fundamentally more fair. That alignment between compliance and ethics is not always available in this field. When it is, you should take it.
The argument for explainable AI in finance is not primarily a compliance argument. It is an argument about what it means to make consequential decisions about people’s lives responsibly. The compliance requirement is the floor. The actual goal is higher than that.
This is the first in a series on AI governance in financial services. Next: building risk intelligence systems that institutions can actually trust in production.