Periodic Review for AI Systems: What GxP Requires

Executive Summary

Periodic review is the quality discipline that confirms a validated system remains in its validated state — that the conditions under which the original validation was valid still hold, that performance has not drifted, and that the documentation accurately reflects current operation. For traditional software, periodic review can lean heavily on the assumption that the underlying system is stable between explicit changes. For AI systems, that assumption is wrong. Models drift, environments evolve, vendor updates accumulate, and what was validated three years ago may no longer describe what the system is doing today.

This article describes what a GxP-appropriate periodic review for AI systems actually looks like. We cover the regulatory basis, the scope and cadence considerations, the specific elements that an AI periodic review must address beyond what traditional CSV review covers, the evidence to compile, and the operational practices that make the discipline sustainable across a growing AI portfolio. Programs that get periodic review right detect problems early; programs that don’t discover them at inspection.

25-40% of AI use cases reviewed in mature pharma quality programs surface at least one finding requiring corrective action during the first periodic review cycle. Programs that haven’t run periodic review on AI haven’t avoided the findings — they just haven’t seen them yet.¹

Why Periodic Review Matters More for AI

The fundamental insight behind periodic review is that systems decay. The validated state at go-live is not the same as the operational state two years later. The underlying technology evolves, the operational environment shifts, the user base changes, and the documentation falls behind reality. Periodic review is the deliberate inspection that surfaces this decay before it becomes a quality event.

For traditional software, decay tends to be slow and discrete — code changes are tracked, configuration updates are logged, and the system between explicit changes is fundamentally stable. Periodic review confirms that the explicit changes have been handled correctly and that nothing material has been missed. The work is meaningful but bounded.

AI systems decay faster and in more dimensions. The model itself can be retrained or replaced. The vendor may update the underlying capability. The input distribution shifts as user behavior evolves. Edge cases that were rare become common. Performance metrics drift. The set of conditions under which the system was validated may no longer describe how it’s actually used. Each of these is a vector for the validated state to drift away from operational reality, and most of them are invisible without deliberate review.

This is why periodic review is more central to AI quality than to traditional CSV. The system has more ways to drift, and many of those ways don’t fire change control triggers. Without periodic review, the drift accumulates silently — until an inspector or an incident makes it visible.

The Regulatory Basis for AI Periodic Review

Periodic review is well-established in GxP quality systems. EU Annex 11 explicitly requires periodic evaluation of computerized systems to confirm continued validation status. The PIC/S guidance on computerized systems echoes the requirement. GAMP 5 Second Edition treats periodic review as a core lifecycle activity for any GxP computerized system, with rigor scaled to system risk and complexity.

For AI specifically, the regulatory framing is still developing but converges on the same direction. FDA’s guidance on AI in drug development emphasizes ongoing performance monitoring and the need for processes that detect and respond to model drift. The EMA reflection paper highlights the lifecycle nature of AI quality and the need for continued evaluation. ISO/IEC 42001 includes monitoring and continual improvement as core elements of an AI management system. The EU AI Act’s provisions for high-risk AI systems require post-market monitoring that has overlap with — though is not identical to — pharma periodic review.

Translating these requirements into practice means that the periodic review for AI systems must explicitly address the AI dimensions of the system, not just the traditional CSV dimensions. A periodic review that checks the system’s general health but doesn’t evaluate model performance, drift, or vendor-side changes is not adequate to the AI risk surface and won’t satisfy a knowledgeable inspector.

Defining the Scope and Cadence

The scope and cadence of periodic review should reflect the risk tier of the use case. The same risk-based principles that apply to validation apply to ongoing review.

Tier	Typical Cadence	Scope Depth
Tier 1 (Low)	Annual or biennial	Confirmation of intended use boundaries, basic operational checks
Tier 2 (Moderate)	Annual	Performance review, change history, drift indicators, training currency
Tier 3 (High)	Semi-annual or annual with quarterly checkpoints	Comprehensive review including formal performance characterization and inspection-readiness assessment
Tier 4 (Critical)	Continuous monitoring with formal periodic review	Device-equivalent rigor, with structured periodic review aligned to device QMS expectations

The cadence should also accommodate trigger-based reviews. A scheduled review may be supplemented by ad-hoc reviews triggered by significant events: a major vendor update, a performance anomaly, an audit finding, or a shift in the operational environment. Trigger-based reviews are not replacements for scheduled review; they are additional safeguards that catch issues which arise off-cycle.

Sakara Digital perspective: The most common scope mistake is reviewing the AI system as if it were generic software — checking change history, confirming user access controls, validating backup procedures — without examining the AI-specific dimensions that drive the actual risk. A periodic review template designed for traditional CSV applied unchanged to an AI system will miss the issues that matter most.

The Elements an AI Periodic Review Must Cover

An adequate AI periodic review covers traditional CSV elements plus several AI-specific elements that don’t appear in conventional review templates.

Performance review

How is the AI performing on its intended use? The review should examine performance metrics over the review period, comparing against the performance characterized during validation and against any acceptance thresholds defined in the validation plan. Performance should be examined both in aggregate and across operationally relevant subsets — performance that’s healthy in aggregate can mask significant degradation on specific subpopulations or use conditions.

Drift assessment

Has the input distribution drifted from what the model was trained and validated against? Has the model’s output distribution shifted? Drift assessment should examine both input drift (the world has changed) and output drift (the model behaves differently on similar inputs). For systems with formal drift monitoring, the review consolidates the period’s drift signals; for systems without, the review may be the first occasion when drift is examined systematically.

The drift assessment should also examine drift in dimensions the original validation may not have anticipated. Operational use of an AI system tends to surface dimensions that weren’t on the radar at validation time — particular subpopulations of input, particular use patterns, particular edge cases. Periodic review is the natural moment to revisit whether the validation evidence still covers the operational reality, or whether the operational reality has expanded in ways that demand additional characterization. This expansion is rarely captured automatically by drift monitoring; it requires the review team to look at the full operational profile and ask whether the validated envelope still fits.

Change history review

What changes occurred during the review period — explicit changes through change control, vendor-side changes, retraining events, configuration updates? Were they handled appropriately? Is the cumulative effect of the changes consistent with the validated state? This review is more substantive for AI than for traditional software because change control for AI is a developing discipline and gaps are common.

Validation documentation review

Does the validation documentation still accurately describe the system? The model card, validation report, intended use definition, and risk assessment should all reflect current reality. Stale documentation is one of the most common findings — and one of the easiest to address if caught during periodic review rather than during inspection.

User and training review

Have users been trained on the appropriate use of the AI? Has the training material been updated to reflect any changes? Are users actually using the AI within its intended use boundaries, or has scope drift occurred informally? User-facing review is often where the most actionable findings emerge, because it surfaces gaps between formal expectations and operational reality.

A particular pattern worth examining is whether users have developed informal practices that compensate for AI limitations or extend the AI’s effective scope. These practices may be entirely sensible — operators frequently develop workflow adaptations that improve outcomes — but they’re outside the validated state and need to either be formalized into the validated workflow or addressed as scope creep. Periodic review is the natural moment to surface and adjudicate these informal practices.

Inspection-readiness assessment

If an inspector arrived tomorrow and asked about this AI system, would the documentation tell a coherent story? Could the team explain the validation, the change history, the performance over time, and the corrective actions taken in response to issues? The periodic review is the rehearsal for inspection. Treating it that way produces a more rigorous review than treating it as a paperwork exercise.

Vendor relationship and roadmap review

Where the AI capability is provided by a vendor, the periodic review should examine the vendor relationship explicitly. Has the vendor remained financially stable, strategically aligned, and contractually compliant? Are vendor-side change notifications being received and processed? Is the vendor’s roadmap aligned with the organization’s continued use of the capability, or are they signaling deprecation, pivot, or reduced investment in the product? Vendor stability has shifted from a procurement concern to an operational risk over the past several years as the AI vendor market has consolidated and pivoted at unusual pace. Programs that don’t review vendor health on a periodic basis tend to discover changes only when they become urgent — at which point the options are narrower and more expensive.

Evidence to Compile and Evaluate

The evidence base for an AI periodic review is broader than for traditional CSV. The review team should compile and evaluate:

Performance data over the review period. Aggregate metrics, subset metrics, comparison to validation baselines, comparison to acceptance thresholds.
Drift monitoring outputs. Input distribution metrics, output distribution metrics, alert history, drift response actions taken.
Change records. All changes through change control, retraining events, vendor-side changes detected, configuration updates.
Incident and deviation records. Any quality events involving the AI system, root causes, corrective actions, effectiveness assessments.
User feedback and support tickets. Patterns in user-reported issues, scope of issues, response and resolution.
Training records. Coverage of the user population, currency of training material, completion rates.
Documentation status. Currency of the validation package, model card, SOPs, intended use definition, risk assessment.
Vendor relationship status. Vendor-side changes, communication quality, contract compliance, any vendor stability concerns.

The evidence is the input; the review is the synthesis. A common failure mode is compiling extensive evidence without integrating it into a coherent assessment of system health. The synthesis is what produces actionable findings — not the raw evidence itself.

Handling Findings and Follow-Through

A periodic review without findings is rare for AI systems in active use. Findings should be characterized by severity and routed into the appropriate corrective action channels.

Severity classification

Critical findings indicate the system is no longer in a validated state and require immediate response, potentially including suspension of use until remediation is complete. Major findings indicate a material gap that requires action within a defined timeframe but does not require immediate suspension. Minor findings are improvement opportunities or documentation updates that can be handled through routine quality processes.

CAPA integration

Findings that require corrective or preventive action should be routed through the QMS CAPA process — not handled informally within the project team. CAPA integration provides the audit trail, accountability, and effectiveness verification that distinguishes addressed findings from forgotten findings. The link between periodic review and CAPA is one of the markers of a mature quality program.

Effectiveness verification

Corrective actions need verification that they actually worked. The next periodic review is one verification opportunity, but for material findings, more direct verification within the period is appropriate. The effectiveness verification closes the loop and confirms that the periodic review produced operational improvement, not just documentation.

Trending across review cycles

A single periodic review is a snapshot; the more powerful signal comes from trends across cycles. Are findings recurring? Are corrective actions sticking? Is the system getting more or less stable over time? Trending requires that findings, severity classifications, and corrective actions be captured in a way that allows comparison across reviews. Programs that treat each review as a standalone exercise miss the trend signal entirely. Investing in the review record structure — so that data from successive reviews is comparable and aggregable — produces a portfolio-level intelligence asset that pays dividends as the AI estate grows. The trends often surface portfolio-level issues that individual reviews would miss: a vendor whose models drift consistently across deployments, a workflow pattern that resists adoption, an infrastructure dependency that creates correlated risk across multiple use cases.

Making the Process Sustainable

Periodic review at scale is a significant operational commitment. Programs that don’t engineer for sustainability tend to find that periodic review either becomes a perfunctory exercise that doesn’t catch real issues, or becomes so demanding that it competes with other work and gets pushed back.

Several practices make periodic review sustainable. First, the evidence compilation should be largely automated. The data sources — performance monitoring, change records, incident systems, training records — should feed into a periodic review dashboard or report that compiles itself rather than requiring manual data gathering for each review. The review team’s effort should go into synthesis and judgment, not data collection.

Second, the review should be staffed appropriately. AI periodic review requires people who understand both quality and AI specifically. Trying to staff it from generalist QA without AI expertise produces reviews that miss AI-specific issues. Building this capability is part of the operational investment in AI quality.

Third, the review cadence should be aligned with the operational rhythm of the use case. A quarterly review for a use case that doesn’t change quarterly is wasteful; an annual review for a use case that changes monthly is inadequate. The cadence should be calibrated to the rate at which change occurs and risk evolves.

Fourth, the periodic review template and process should be a living artifact. As the organization learns what kinds of findings emerge, the template should evolve to surface them more reliably. Programs that use the same template for years without revision tend to have reviews that are increasingly disconnected from where the actual issues are.

Fifth, periodic review findings should feed into program-level learning. Patterns across multiple use cases — recurring drift dynamics, common documentation gaps, vendor-related issues — should inform program-level practices. The periodic reviews are not just use-case-level events; they’re a portfolio-level intelligence source for improving how the program operates.

Integrating periodic review with the broader quality calendar

Periodic review for AI shouldn’t sit in isolation from the rest of the quality calendar. Aligning AI periodic reviews with related quality events — annual product reviews, management reviews, supplier qualification reviews — creates leverage and reduces the risk that AI sits as a side concern. When the management review surfaces a portfolio-level pattern from periodic reviews, that visibility flows into resource allocation, capability investment, and program priorities. Conversely, when management review priorities surface concerns about specific risk dimensions, periodic reviews can be tuned to surface evidence on those dimensions. This integration produces a quality system that operates coherently across AI and the rest of the GxP estate, rather than running AI quality on a parallel track.

Building review capability over time

The first periodic review of a given AI use case is rarely the best one. The review team is learning what to look for, the evidence base is being assembled, and the right cadence is being calibrated. The second and third reviews tend to be substantially more efficient and more incisive as the team accumulates experience. Programs should plan for this learning curve explicitly — staffing the first reviews with extra senior involvement, capturing learnings into the review template and process, and refining the cadence based on what early reviews surface. Programs that expect first-cycle perfection tend to be disappointed; programs that plan for learning produce sustainable quality over multiple cycles.

Periodic review is the discipline that turns one-time validation into ongoing assurance. For AI systems, the discipline is more demanding than for traditional software because the underlying system has more ways to drift. Done well, it catches problems early, satisfies regulators, and builds the organizational learning that makes the AI portfolio safer over time. Done poorly or skipped, it leaves the program vulnerable to the surprises that AI’s particular dynamics produce.

References

For Further Reading

GxP and AI tools: Compliance, Validation and Trust in Pharma — EY.
EU GMP Annex 22: AI Compliance in Pharma Manufacturing — IntuitionLabs.
Navigating AI Regulations in GxP: A Comparative Look at EU AI Act, EU Annex 22 & FDA AI Guidance — Zifo.
AI in Pharma and Life Sciences — Deloitte.
ICH Q10 Pharmaceutical Quality System Guidance: Understanding Its Impact — PubMed Central.
Generative AI in the pharmaceutical industry: Moving from hype to reality — McKinsey & Company.

Amie Harpe Founder and Principal Consultant

Amie Harpe is a strategic consultant, IT leader, and founder of Sakara Digital, with 20+ years of experience delivering global quality, compliance, and digital transformation initiatives across pharma, biotech, medical device, and consumer health. She specializes in GxP compliance, AI governance and adoption, document management systems (including Veeva QMS), program management, and operational optimization — with a proven track record of leading complex, high-impact initiatives (often with budgets exceeding $40M) and managing cross-functional, multicultural teams. Through Sakara Digital, Amie helps organizations navigate digital transformation with clarity, flexibility, and purpose, delivering senior-level fractional consulting directly to clients and through strategic partnerships with consulting firms and software providers. She currently serves as Strategic Partner to IntuitionLabs on GxP compliance and AI-enabled transformation for pharmaceutical and life sciences clients. Amie is also the founder of Peacefully Proven (peacefullyproven.com), a wellness brand focused on intentional, peaceful living.

See Full Bio