The Post-Deployment Discipline

Six months after a healthcare AI model goes live is when the trouble shows up. Not at deploy. Not in QA. Six months in, when the team that built it has moved to the next project and the model is doing something quietly different than it did on day one.

I promised the post-deployment list at the end of Issue #9, so here it is.

The five items I run on every model that has been in production for more than thirty days. None of them are glamorous. All of them have caught something that would have ended up in a clinical case review if they had been skipped.

If Issue #9 was the discipline that protects the model before it goes live, this is the discipline that protects the model from itself once it is already there.

𝗜𝗧𝗘𝗠 𝟭. 𝗧𝗛𝗘 𝗖𝗢𝗡𝗖𝗘𝗣𝗧 𝗗𝗥𝗜𝗙𝗧 𝗪𝗔𝗧𝗖𝗛

The model is fine until the population it is scoring stops looking like the population it was trained on.

That shift rarely arrives as a single event. It arrives as a slow change in the case mix walking through the door. A new referral pattern. A seasonal shift in acuity. A change in how upstream clinicians document a particular condition. The model keeps producing confident outputs. The outputs keep getting quietly less accurate.

The watch itself is unglamorous. I track three things on a monthly cadence: the input feature distribution, the prediction distribution, and the base rate of the outcome the model is supposed to predict. Any of the three moving more than a defined threshold is a trigger to re-evaluate.

A model that is being monitored for accuracy alone, without a concept drift watch underneath it, is being monitored too late.

𝗜𝗧𝗘𝗠 𝟮. 𝗧𝗛𝗘 𝗖𝗟𝗜𝗡𝗜𝗖𝗜𝗔𝗡 𝗢𝗩𝗘𝗥𝗥𝗜𝗗𝗘 𝗣𝗔𝗧𝗧𝗘𝗥𝗡

The most honest performance signal a deployed model has is the rate, the timing, and the reason behind clinician overrides.

Aggregate override rate is the headline. The pattern inside it is where the value sits. Are clinicians overriding the model on a specific patient subgroup. Are overrides clustering on certain shifts, certain units, or after certain upstream events. Are the overrides agreeing with each other in ways that suggest a systematic blind spot the model has not learned.

I read overrides the way an experienced reader reads negative reviews. The aggregate is noise. The clustering is the signal.

If the team running the model does not have a structured way to capture the reason a clinician overrode it, that team is flying without one of the two instruments it actually needs.

𝗜𝗧𝗘𝗠 𝟯. 𝗧𝗛𝗘 𝗔𝗗𝗩𝗘𝗥𝗦𝗘 𝗘𝗩𝗘𝗡𝗧 𝗧𝗥𝗔𝗖𝗘-𝗕𝗔𝗖𝗞

If something goes wrong in clinical care and the model touched the decision, you need to be able to trace from the outcome back to the model version, the input features at the time of prediction, and the training data lineage of that version.

This is not a nice-to-have. This is the part of the system that determines whether the post-event review reads as a serious operator's response or as an institution scrambling.

In a Canadian provincial environment, the trace-back lives inside the institution's quality and safety framework, the privacy office's audit posture, and the model's own logging architecture. In the US, the same trace-back has to satisfy HIPAA, the institution's quality program, payer scrutiny, and an increasingly active set of state AI accountability regimes.

The mistake I see most often is that the trace-back is designed after the first event, not before. The trace-back you build under pressure is always worse than the one you build cold.

𝗜𝗧𝗘𝗠 𝟰. 𝗧𝗛𝗘 𝗗𝗘𝗖𝗢𝗠𝗠𝗜𝗦𝗦𝗜𝗢𝗡𝗜𝗡𝗚 𝗣𝗟𝗔𝗡

Every model in production should have a written answer to a single question. When does this model leave production, and who has the authority to make that call.

Most do not. Most healthcare AI models are deployed with a build plan, a deployment plan, and no plan at all for the day they stop being useful. The result is a class of model I have started calling zombie AI. Still running. Still producing outputs. Still being trusted because it has been there a long time. Nobody on the current team can fully explain how it was trained or what its known failure modes are.

The decommissioning plan does not have to be elaborate. It has to exist. Conditions that trigger sunset. The person with the authority to sunset. The communication plan to the clinicians who have been relying on the output. The successor plan, if there is one.

A model without a decommissioning plan is borrowing legitimacy from the institution that hosts it. That loan eventually comes due.

𝗜𝗧𝗘𝗠 𝟱. 𝗧𝗛𝗘 𝗤𝗨𝗜𝗘𝗧 𝗥𝗘-𝗩𝗔𝗟𝗜𝗗𝗔𝗧𝗜𝗢𝗡

Every six months, I run the full pre-deployment checklist against the model again. Not a delta check. The whole thing. Population coverage. Clinical reasoning walkthrough. Privacy and governance. Failure-mode test. Off-switch and monitoring.

Most teams skip this because it feels redundant. It is not redundant. The model has been touching production data, scoring an evolving population, and absorbing operational changes for six months. Re-running the original gate on the version of the model that actually exists today, against the data it actually sees today, has caught issues that no continuous monitoring dashboard would have surfaced.

The first time a team does this it feels like overkill. The second time it feels like the only sensible thing on the calendar. The third time, the conversation about whether to do the fourth one is not happening anymore.

𝗧𝗛𝗘 𝗣𝗔𝗧𝗧𝗘𝗥𝗡 𝗕𝗘𝗛𝗜𝗡𝗗 𝗧𝗛𝗘 𝗙𝗜𝗩𝗘

The five pre-deployment items in Issue #9 protect a model from going live with a problem nobody noticed.

The five post-deployment items protect a model from quietly turning into a different model than the one that was approved.

Both lists exist for the same reason. A healthcare AI model is not a feature. It is a piece of clinical infrastructure that earns and re-earns the right to be in production. The day a team starts treating it as deployed and done is the day the slow drift begins.

If you have a post-deployment item I am missing, reply or comment below. The version of this list I run two years from now will be different from this one, and the discipline is more useful when more operators have written down what they actually do.

- Guryash

P.S. The next issue is going to be the part most people do not enjoy writing about. The cost line of healthcare AI. Where the money actually goes inside a real deployment, what surprises a first-time buyer, and what the unit economics look like once a model has been running for a year. That issue is coming.

Want more? Follow me on LinkedIn where I share daily insights on healthcare AI implementation: linkedin.com/in/guryashsingh

The Post-Deployment Discipline

Keep Reading

Subscribe for new reads…

Quick Links

Subscription

Socials