How to Build an AI Change Control Process in Regulated Systems

Executive Summary

Change control is the discipline that prevents validated systems from drifting out of compliance over time. Traditional GxP change control was built for software where changes are deliberate, discrete, and authored by humans. AI systems break those assumptions. Models can be retrained on new data without anyone editing code. Vendor model updates can shift behavior in ways that aren’t visible in change-log diffs. Drift can accumulate silently between explicit changes. A change control process that wasn’t designed for these dynamics will either miss material changes or treat every minor update as a major event — both of which are failure modes regulators have started to flag.

This article lays out a practical framework for AI change control in regulated environments. We cover the types of changes the process must handle, the triggers that initiate change control, the impact assessment that determines the appropriate response, the approval paths and documentation that satisfy inspection, and the post-change verification that closes the loop. The goal is a change control process that is rigorous where rigor matters and proportionate where it doesn’t.

60% of pharma AI deployments encounter material change events within the first 18 months that the original change control process was not equipped to handle. The most common gap is failure to define what constitutes a material change for AI specifically.¹

Why Traditional Change Control Breaks for AI

Traditional change control is built around the idea that changes are events: someone decides to modify the system, the change is described, it’s reviewed and approved, the change is implemented, and verification confirms the system still meets requirements. The model assumes that between changes the system is static — its behavior is fixed by its code and configuration, and what was true at validation remains true until the next deliberate change.

AI systems violate this assumption in several ways. First, the system’s behavior depends on a model whose internal parameters are not human-readable in any practical sense — meaning a “no code change” event can still produce a behavioral change if the model is retrained or replaced. Second, vendor-supplied AI components can change without the organization’s awareness if the vendor updates the underlying model behind a stable API. Third, even without explicit changes, the operational environment can drift in ways that change the model’s effective performance — input distributions shift, edge cases that were rare become common, and what was validated stops describing reality.

A change control process that doesn’t recognize these dynamics will miss material changes. The system will drift out of its validated state without any change control event ever firing, and the organization will discover the gap during inspection or, worse, during an incident. The fix is not to abandon change control but to extend it: redefine what counts as a change, redefine the triggers, and adapt the impact assessment to the AI-specific risk surface.

The Change Types Your Process Must Handle

An AI-aware change control process must recognize several categories of change that don’t exist in traditional CSV.

Explicit changes

These are the changes that look most like traditional change control: deliberate updates to code, configuration, prompts, model selection, or integration. They follow the familiar pattern — a change is proposed, evaluated, approved, implemented, and verified. AI-specific dimensions still apply, particularly around the validation evidence required to demonstrate that the changed system continues to meet its intended use.

Model retraining events

When a model is retrained on updated data, its parameters change even though no human “edited” anything in the traditional sense. The behavioral consequences can range from imperceptible to substantial depending on what changed in the training data and how. The change control process must treat retraining events as first-class changes with their own evaluation, approval, and verification requirements — even when they are scheduled or automated.

Vendor model updates

For AI capabilities provided by external vendors, model updates may happen on the vendor’s schedule rather than yours. The change control process must establish how vendor updates are detected, evaluated, and managed — including the contractual provisions that give the organization visibility into and control over vendor-side changes that could affect validated use cases.

Data drift and operational drift

The hardest category to operationalize. The model itself hasn’t changed, but the environment around it has — input distributions have shifted, edge cases are appearing more often, performance metrics are trending. These aren’t traditional changes, but they have the same effect: the system is no longer in its validated state. The change control process must define how drift is detected and what response it triggers.

Configuration and prompt changes

For LLM-based systems, prompt changes can substantially alter behavior without touching the underlying model. A prompt update is a change. The change control process must include prompt management with versioning and impact assessment proportional to the risk of the use case.

Infrastructure and integration changes

Changes to the surrounding infrastructure — data pipelines feeding the model, integration with downstream systems, identity and access management, monitoring tooling — can affect AI behavior even when nothing about the model itself changes. A pipeline update that subtly alters the format or freshness of input data can shift the model’s effective behavior without any AI-specific change appearing in the change log. Recognizing infrastructure changes as potential AI changes — and routing them through impact assessment with that lens — is one of the operational disciplines that distinguishes mature AI change control. The standard infrastructure change process is rarely calibrated to detect downstream AI behavioral effects, which means the AI change control process needs to overlay it deliberately.

Defining Change Triggers

A change trigger is the event that initiates the change control process. For traditional software, triggers are clear: someone wants to modify the system. For AI systems, triggers must be defined more broadly because changes can occur without human intent.

Trigger Type	Description	Detection Mechanism
Proposed change	A team proposes to modify code, config, prompts, or model selection	Standard change request workflow
Scheduled retraining	A planned model retraining event using updated data	Retraining schedule and pre-event review
Vendor change notification	The vendor announces a model update or behavioral change	Vendor relationship monitoring; contractual notification provisions
Drift threshold exceeded	Monitoring detects performance or input distribution drift beyond defined thresholds	Continuous monitoring with alerting
Incident or near-miss	An operational event reveals behavior outside the validated envelope	Incident management integration
Periodic review finding	A scheduled periodic review identifies a gap requiring change	Periodic review process

The trigger taxonomy is the front door of the change control process. If a category of change can occur but isn’t represented in the triggers, that category will silently bypass change control entirely. Inspectors who understand AI will probe specifically for the trigger categories that traditional change control would miss.

Sakara Digital perspective: The single most overlooked trigger category is vendor-side change. Organizations frequently lack visibility into vendor model updates and assume the vendor’s API stability implies behavioral stability. The contractual and operational mechanisms that surface vendor changes — and the change control responses they trigger — are the most common gap we find in pharma AI change control assessments.

Impact Assessment for AI Changes

Once a change is triggered, the impact assessment determines the appropriate response. The assessment evaluates how the change affects the validated state, what risks it introduces, and what activities are required to restore validated status.

For AI changes, the impact assessment must address dimensions that traditional CSV doesn’t typically cover: behavioral change beyond what the change description implies, performance change on validated benchmarks and on edge cases, change to model lifecycle artifacts (training data, model card, performance documentation), and change to the conditions under which the original validation was valid. The assessment may conclude that the change is minor and requires only documentation, that it requires partial revalidation of specific use cases or features, or that it requires full revalidation of the use case.

The “behavioral envelope” concept

A useful concept for AI impact assessment is the behavioral envelope: the set of conditions under which the model has been demonstrated to perform adequately for its intended use. A change that keeps the system within its validated envelope requires lighter response than one that pushes outside the envelope. The envelope can be characterized by performance metrics on representative data, behavior on defined edge cases, and integration boundaries. Changes are evaluated against whether they preserve, narrow, or expand the envelope.

Pre-change vs. post-change evidence

The impact assessment may require pre-change evidence (performance characterization on test sets representative of the validated envelope) or post-change evidence (verification on the same test sets after the change). Higher-risk changes typically require both, with explicit decision points that allow rollback if the post-change evidence doesn’t meet acceptance criteria. Building these decision points into the change protocol — rather than handling rollback as an exception — is one of the practical engineering disciplines that separates mature AI change control from immature.

Behavioral comparison against a stable benchmark

One specific technique worth highlighting is maintaining a stable benchmark dataset that every change is evaluated against. The benchmark is a curated set of inputs representative of the operational envelope, with known expected behavior or human-graded reference outputs. Each candidate change runs against the benchmark, and the comparison to the prior version becomes a key input to the impact assessment. The benchmark itself is a configuration-controlled artifact that evolves slowly and deliberately — changes to the benchmark are themselves change-controlled events. Over time, the benchmark becomes one of the most valuable assets in the AI quality program because it provides comparable performance signals across the system’s lifetime.

Approval Paths and Documentation

Approval paths must scale to impact, just as validation rigor scales to risk. A trivial prompt update for a Tier 1 use case shouldn’t require the same approval ceremony as a model replacement for a Tier 3 use case. The change control process should define approval paths by combination of use case tier and change impact:

Routine change. Low-impact changes on lower-tier use cases. Local approval, standard documentation, light verification.
Standard change. Moderate-impact changes on moderate-tier use cases. Cross-functional review, documented impact assessment, verification testing before promotion.
Major change. High-impact changes on any tier, or any change on Tier 3 use cases. Formal change control board review, comprehensive impact assessment, full revalidation of affected components.
Emergency change. Changes required to address immediate operational or compliance issues. Streamlined approval with explicit catch-up documentation and post-implementation review.

The documentation must be sufficient for an inspector to reconstruct what changed, why, and how the organization satisfied itself that the changed system remained fit for use. This is the test the documentation has to pass — not whether it follows a template, but whether it answers the questions an inspector will ask.

Post-Change Verification and Monitoring

Post-change verification confirms that the change was implemented as approved and that the system continues to meet its intended use. For AI systems, verification often requires comparison of pre-change and post-change behavior on a representative test set, particularly for the performance characteristics that anchor the validation. The verification evidence becomes part of the validation package, extending the running record of how the system has performed across its lifecycle.

Monitoring after the change should be elevated for a defined period. The first weeks after a change are when behavioral surprises tend to surface — often in conditions that weren’t fully covered by pre-change testing. Elevated monitoring catches these surprises before they propagate, and the heightened monitoring period itself becomes part of the post-change protocol.

Closing the loop with the QMS

Post-change verification feeds back into the QMS in several ways. The validation package is updated. The model card is revised. The training records are updated if the change affected user-facing behavior. The risk register is reviewed for changes to risk profile. Closing all of these loops is the operational discipline that distinguishes change control from change announcement. Programs that announce changes but don’t close the QMS loops accumulate documentation debt that becomes visible in the third or fourth inspection cycle.

Communication to affected users

An often-overlooked element of post-change activity is communication to the people who use the AI capability. When a model update changes behavior in user-visible ways, users need to know — both to set appropriate expectations and to maintain the trust calibration that effective human-in-the-loop review depends on. A change that silently shifts AI behavior can erode user trust in ways that are hard to recover. Building user communication into the post-change protocol — at a level proportionate to the change impact — preserves the human-side conditions that make AI capabilities effective in regulated workflows. The communication doesn’t need to be elaborate; it needs to be accurate and timely.

Scaling the Process Across the Portfolio

The framework above describes change control for a single use case. Scaling it across a portfolio of dozens or hundreds of AI use cases requires additional engineering.

First, the process must be supported by tooling that doesn’t depend on heroic effort to operate. Change requests, impact assessments, approvals, and verification evidence should live in workflow tooling — ideally the same tooling that handles the rest of the QMS — with audit trails that don’t require manual reconstruction. Spreadsheets and email threads do not scale to portfolio-level AI change control.

Second, the process must distinguish between portfolio-level changes (affecting many use cases) and use-case-level changes (affecting one). A foundation model upgrade may affect every downstream use case; a prompt change affects one. The process must accommodate both without forcing portfolio-level changes through use-case-level workflows or vice versa.

Third, the process should include mechanisms for cross-portfolio learning. When a change to one use case reveals a pattern (a vendor’s model updates introduce a recurring behavioral issue, for example), the learning should be captured and applied to other use cases that may face the same dynamic. This kind of cross-portfolio learning is one of the differentiators of mature AI quality programs.

Common Pitfalls and How to Avoid Them

Several patterns recur across pharma AI change control implementations. Each is worth recognizing in advance.

Treating model updates as configuration changes. A model update is more substantive than a configuration change because the system’s behavior derives from the model itself. Treating it as configuration trivializes the impact assessment and produces validation gaps.

Failing to capture vendor-side changes. Without explicit detection mechanisms — contractual notification, monitoring, periodic review — vendor-side changes can occur invisibly. Programs that haven’t built these mechanisms in are vulnerable to behavioral drift they can’t even attribute.

Excessive ceremony for low-impact changes. The opposite failure mode. Change control that requires the same ceremony for every change drives the program to avoid changes — including changes that would improve the system — or to bypass change control through informal workarounds. Proportionate change control supports rather than fights legitimate evolution.

Documentation that doesn’t tell the story. Change records that capture what changed but not why or how the impact was assessed leave inspectors unable to evaluate whether the change was handled appropriately. The narrative quality of the documentation matters as much as its completeness.

Treating change control as separate from monitoring. Change control and monitoring are two sides of the same discipline. Without monitoring, drift triggers can’t fire; without change control, monitoring findings don’t translate into action. Programs that treat them as separate functions tend to have gaps where the two should intersect.

Allowing emergency change to become the default. Some programs use the emergency change path liberally to avoid the friction of standard change control. Over time, the emergency path becomes the actual operating norm and the standard path becomes vestigial. Inspectors notice this pattern quickly — the proportion of changes processed through emergency channels is a leading indicator of change control health. Discipline about reserving emergency change for genuine urgency, with explicit post-event documentation and review, is part of keeping the change control process credible.

AI change control is not change control with extra steps. It is change control adapted for a class of systems whose behavior depends on artifacts that traditional change control wasn’t designed for. Done well, it preserves the validated state of the AI portfolio over time and gives the organization the visibility to evolve its AI capabilities deliberately rather than reactively.

References

For Further Reading

GxP and AI tools: Compliance, Validation and Trust in Pharma — EY.
Master Data Management for Life Sciences and Pharmaceuticals Industries — CluedIn.
EU GMP Annex 22: AI Compliance in Pharma Manufacturing — IntuitionLabs.
Generative AI in the pharmaceutical industry: Moving from hype to reality — McKinsey & Company.
How pharma is rewriting the AI playbook — McKinsey & Company.
Navigating AI Regulations in GxP: A Comparative Look at EU AI Act, EU Annex 22 & FDA AI Guidance — Zifo.

Amie Harpe Founder and Principal Consultant

Amie Harpe is a strategic consultant, IT leader, and founder of Sakara Digital, with 20+ years of experience delivering global quality, compliance, and digital transformation initiatives across pharma, biotech, medical device, and consumer health. She specializes in GxP compliance, AI governance and adoption, document management systems (including Veeva QMS), program management, and operational optimization — with a proven track record of leading complex, high-impact initiatives (often with budgets exceeding $40M) and managing cross-functional, multicultural teams. Through Sakara Digital, Amie helps organizations navigate digital transformation with clarity, flexibility, and purpose, delivering senior-level fractional consulting directly to clients and through strategic partnerships with consulting firms and software providers. She currently serves as Strategic Partner to IntuitionLabs on GxP compliance and AI-enabled transformation for pharmaceutical and life sciences clients. Amie is also the founder of Peacefully Proven (peacefullyproven.com), a wellness brand focused on intentional, peaceful living.

See Full Bio