Schedule a Call

Annex 22 Mock Inspection: What a Pharma Quality Team Should Practice Now

Executive Summary

EMA’s draft Annex 22 — published for consultation July 7, 2025, consultation closed October 7, 2025, finalization expected during 2026 — will reshape AI inspection expectations across EU pharma manufacturing once it takes effect. Quality teams that wait until the framework finalizes before practicing the inspection conversation typically deliver weaker performance under actual inspection pressure than teams that practice through structured mock inspections during the preparation window.

This article articulates the mock inspection structure that has worked across our client engagements: how to scope the mock, the scenarios to walk through, the documentation review discipline, the inspector behavior the mock should anticipate, the lessons that typically emerge, and the post-mock actions that translate the mock into operational improvement. The objective is a mock inspection that meaningfully prepares the quality team, not a ceremonial exercise that produces a report and changes nothing.

6-12 months is the typical grace period between EMA finalization of an annex and its legal implementation. For Annex 22, this places the operational implementation window in 2026 to early 2027, with mock inspections during the preparation window producing materially better inspection readiness than reactive preparation under deadline pressure.1

Why Mock Inspections Matter for Annex 22

Mock inspections are a well-established discipline in pharma quality. The discipline works because actual inspection performance depends substantially on the quality team’s familiarity with the inspection conversation, the muscle memory of producing requested documentation, and the cross-functional coordination required to respond effectively. Mock inspections build these capabilities in a low-stakes context where errors can be analyzed and addressed.

For Annex 22 specifically, mock inspections matter more than for incremental regulatory changes. Annex 22 introduces categorically new inspection topics — AI risk classification, static/deterministic restrictions for critical functions, human-in-the-loop oversight evidencing, lifecycle management for AI — that most quality teams have not previously been inspected on. The inspection conversation for these topics is not yet routinized in the industry, and the documentation patterns that hold up are still maturing.

Quality teams that practice the inspection conversation through mock inspections during the preparation window develop the muscle memory and identify the documentation gaps before the actual inspections begin. Teams that wait until Annex 22 finalizes consistently produce weaker performance because the muscle memory has not been built and the gaps have not been surfaced under low-stakes conditions.

The broader strategic context, as discussed in Rephine’s analysis of Annex 22 preparation, is that the grace period after publication is typically too short for organizations starting from scratch to build the required disciplines. Mock inspections during the preparation window are one of the most concrete mechanisms quality teams have to compress the preparation effectively.

Scope of the Mock Inspection

The scope of an Annex 22 mock inspection should reflect the scope of expected actual inspections. EMA inspectors applying Annex 22 will typically probe the following areas, drawn from the draft annex’s structure:

  • AI inventory and classification across manufacturing systems
  • Risk-based tiering methodology and its application
  • Compliance with the static/deterministic restriction for critical functions
  • Validation evidence for in-scope AI systems
  • Human-in-the-loop oversight design and evidence
  • Lifecycle management, including change control for AI components
  • Performance monitoring and response procedures
  • Vendor management for AI-enabled manufacturing systems
  • Training and competency for staff working with AI systems
  • Documentation alignment with Annex 22 vocabulary and structure

A mock inspection should cover the substantial majority of these areas, not just the ones the quality team is most comfortable with. The objective is to identify the gaps, and gaps cluster in the areas the team has not yet addressed in depth.

The mock should be scoped to a specific manufacturing site or business unit, not to the enterprise as a whole. Enterprise-scope mocks consistently produce shallower findings because they cannot drill into specific documentation and specific use cases. Site-scope mocks produce findings that are operationally actionable.

The mock should also include both prepared scenarios (where the documentation has been organized in advance) and unprepared scenarios (where the inspection team responds to scenarios it has not previously rehearsed). The unprepared scenarios reveal capability that the prepared scenarios cannot, and inspectors in actual inspections will probe in ways the prepared scenarios will not anticipate.

Scenarios to Walk Through

The scenarios in a mock inspection should reflect the actual inspection conversations EMA inspectors are likely to initiate. The scenarios we have used across client mocks fall into several categories:

Scenario TypeExample QuestionWhat It Tests
Inventory walk-through“Show me your inventory of AI use in manufacturing systems”Whether the inventory is current, complete, and includes vendor-embedded AI
Classification probe“Walk me through how this use case was classified as moderate-impact”Whether the classification methodology is substantive or perfunctory
Critical function check“Show me the evidence that this AI is static and deterministic”Whether the static/deterministic restriction is being evidenced or asserted
Validation reconstruction“Show me the validation evidence that supports this AI’s intended use”Whether validation is documented to a standard that survives external scrutiny
Human oversight evidencing“Show me the records of human review for this workflow over the past month”Whether human oversight is real and evidenced, not nominal
Change control walk-through“Show me the change record for the last vendor update to this AI component”Whether vendor-driven changes flow through documented change control
Performance monitoring drill-down“Show me the performance monitoring data for this AI for the past quarter, and any response triggered”Whether monitoring is operational and produces actionable signal
Incident response“Walk me through an AI-related deviation from the past six months and how it was handled”Whether the deviation handling is mature for AI-specific incidents

Each scenario should be walked through with the quality team responding in real time, with the documentation and the conversation observed by the mock inspection lead. The objective is to surface the specific points at which the response is weak — not to confirm that the documentation exists, but to confirm that the conversation around the documentation is defensible.

A useful discipline during the mock: the mock inspection lead should be willing to probe inconvenient threads, including challenging the reasoning behind risk classifications, asking for evidence of human override actually catching errors, and requesting documentation of vendor change notification practices. Inspectors in actual inspections will probe these threads, and mock inspections that avoid them produce false confidence.

The Documentation Review

The documentation review is the foundation of the mock inspection. The documentation that should be reviewed includes:

The AI inventory. Is it current? Does it include vendor-embedded AI? Does the classification reflect substantive thinking?

Tier classification SOPs. Do they include anchored definitions for each tier? Do the classifications applied to the inventory follow the SOPs consistently?

Validation packages. Do they articulate the intended use clearly? Do they include performance benchmarking against a reviewed reference set? Do they document the validation approach with the rigor Annex 22 expects?

Human-in-the-loop documentation. Does the workflow design make the human checkpoint substantive? Are there records showing the human checkpoint actually performing its validation function?

Change control records. Are vendor-driven AI changes flowing through documented change control? Are the records substantive or perfunctory?

Performance monitoring records. Is monitoring operational? Are the records showing both the monitoring data and any response procedures triggered?

Vendor management documentation. Are vendor contracts addressing AI-specific topics including change notification, validation cooperation, and model version pinning? Is vendor qualification substantive?

Training records. Are staff working with AI systems trained on the AI-specific aspects, not just the general system? Are competency assessments documented?

The documentation review should not be a tick-box exercise. The reviewer should be asking whether the documentation would survive substantive scrutiny from an inspector who has been trained on Annex 22 and is probing for substance rather than form.

As Epista’s analysis of preparing for the GMP revisions reinforces, the documentation expectations under Annex 22 are higher than under historical practice, and quality teams should expect the documentation review to surface gaps that have not previously been visible.

Sakara Digital perspective: The single most useful question to ask during the documentation review is: “what would have to be true for an inspector to find this documentation inadequate?” The answer often surfaces specific weaknesses — undocumented assumptions, missing evidence of substantive thinking, gaps in coverage — that the team would not surface by reviewing what is present rather than probing what is absent.

Inspector Behavior to Anticipate

The mock inspection should anticipate the actual inspector behaviors EMA inspectors are likely to display when applying Annex 22. Several patterns to anticipate.

Inspectors will vary in AI expertise. Especially during the early implementation period, inspectors applying Annex 22 will have varying degrees of AI background. Mock inspections should include scenarios where the inspector is asked to explain technical AI concepts to a non-technical inspector — the explanation itself is what the actual inspector will probe.

Inspectors will probe critical thinking. The most consistent positive inspection finding in modern GxP inspections is when the documentation evidences substantive critical thinking. Inspectors increasingly distinguish between documentation that reflects real analysis and documentation that reflects process completion without analysis.

Inspectors will follow the threads. Modern inspections do not stop at the first documented control; they follow threads through change records, performance monitoring, and incident management to determine whether the framework is operating as documented. Mock inspections should follow threads similarly to identify where the framework breaks down.

Inspectors will ask about exceptions. Documentation describes the intended operation; inspectors will ask about the exceptions. “What happens when the AI’s confidence is below threshold? Walk me through a case.” The exception handling is often where the framework’s substance is tested.

Inspectors will probe the human checkpoint substance. The human-in-the-loop expectation is well-articulated in Annex 22, and inspectors will probe whether the human checkpoint is substantive. Mock inspections should specifically probe whether the human checkpoint can be demonstrated to catch errors, not just to approve outputs.

Inspectors will request access to live systems. Modern inspections often include direct examination of live systems, not just documentation review. Mock inspections should include scenarios where the mock inspector requests live system access and the quality team walks through the system in real time.

The mock inspection lead should be willing to play these inspector behaviors authentically. Mocks that play the inspector behavior gently produce false confidence; mocks that play it rigorously produce findings that translate into operational improvement.

Lessons That Typically Emerge

The lessons that typically emerge from Annex 22 mock inspections cluster in recognizable categories.

Inventory completeness. Most quality teams discover that their AI inventory is less complete than they thought. Vendor-embedded AI in MES, LIMS, EMS, and similar platforms is consistently underrepresented, and the mock inspection surfaces this gap.

Classification substance. Risk classifications produced quickly during initial implementation often turn out to lack substantive thinking. The mock inspection surfaces classifications that the team cannot defend in the inspection conversation.

Validation documentation rigor. Validation documentation that looks adequate to the team often turns out to be insufficient for the substantive inspection probing. The mock inspection surfaces specific documentation gaps that require remediation.

Human checkpoint substance. Human checkpoints that the team had assumed were substantive often turn out to be nominal under probing. The mock inspection surfaces whether the human checkpoint can demonstrate substantive validation function.

Cross-functional coordination. Mock inspections frequently surface cross-functional coordination gaps — between QA, IT, the use case owner, and the vendor management function — that produce documentation inconsistencies and response delays. The coordination issues are operational, not documentary, and the mock surfaces them in ways routine review does not.

Vendor relationship maturity. Vendor management for AI-enabled manufacturing systems often turns out to be less mature than the team thought. The mock inspection surfaces specific gaps in vendor change notification, validation cooperation, and ongoing oversight.

Inspector vocabulary alignment. Documentation that uses bespoke organizational vocabulary often turns out to be harder for the mock inspector to navigate than documentation aligned with Annex 22 terminology. The mock surfaces specific vocabulary improvements that materially improve inspection navigability.

Post-Mock Actions

The mock inspection’s value is realized in the post-mock actions, not in the mock itself. The actions that translate the mock into operational improvement.

Detailed findings report. The mock should produce a findings report that articulates each gap identified, the documentation or operational element involved, the inspection question that surfaced the gap, and the recommended remediation. The report should be structured similarly to an actual inspection findings report.

Remediation plan with ownership and timing. Each finding should be assigned an owner with a specific remediation timeline. Findings without ownership or timing rarely produce remediation; findings with both consistently do.

Cross-functional steering committee review. The findings should be reviewed by the cross-functional AI governance steering committee, not only by QA. The cross-functional review produces broader organizational alignment on the remediation priorities.

Repeat mock at 6-month interval. A single mock inspection produces a snapshot; repeat mocks produce trend signal on whether the remediation is actually producing improvement. The repeat mock at 6-month intervals during the preparation window is one of the most direct mechanisms for tracking inspection readiness over time.

Documentation updates incorporating mock findings. The documentation gaps identified in the mock should produce specific documentation updates. The updates should be tracked through the formal documentation change control process to maintain version integrity.

Training adjustments. The mock typically surfaces training gaps — staff who could not articulate the AI tier classification methodology, IT staff who could not walk through the validation approach, and so on. Training adjustments to address these gaps are direct, actionable outcomes.

Communication to senior leadership. The mock findings, the remediation plan, and the trajectory toward inspection readiness should be communicated to senior leadership. This communication produces resource allocation support for the remediation work and ensures that the mock results inform broader strategic decisions.

Coordinating the mock with broader inspection readiness programs

An integration point worth surfacing: the Annex 22 mock inspection does not exist in isolation. Most pharma sites already have inspection readiness programs covering broader GMP topics, and the Annex 22 mock should be integrated with these programs rather than running in parallel. The integration includes scheduling coordination, scope alignment, and findings consolidation.

Quality teams that run the Annex 22 mock as a separate, AI-specific exercise often discover that the broader inspection readiness program does not absorb the findings effectively, and the AI-specific remediation lags behind the broader inspection readiness work. Integrated programs produce more coherent operational improvement than parallel programs.

How the mock supports cross-functional capability building

Beyond the immediate inspection readiness benefit, mock inspections build cross-functional capability that translates into actual inspection performance. The QA staff who respond to mock inspection probing develop muscle memory that supports actual inspection performance. The IT staff who walk through validation approaches develop the ability to explain AI systems in inspection-appropriate language. The vendor management staff who respond to mock probing develop the discipline that vendor management requires under Annex 22.

This capability building is an underappreciated benefit of mock inspections. Quality teams that view the mock primarily as a findings exercise capture only part of the value; teams that view it as a capability-building exercise capture both the findings benefit and the durable capability improvement that supports inspection readiness across multiple inspection cycles.

The economics of mock inspection investment

A final practical point. Mock inspections require meaningful time investment from the QA, IT, vendor management, and operational teams. The investment is substantial enough that some quality teams defer mock inspections to focus on direct remediation work, with the assumption that the remediation will produce inspection readiness without the mock.

The pattern we have consistently observed is that this trade-off produces worse outcomes than running the mock. Direct remediation without the mock produces remediation against the team’s understanding of what is needed, which often misses the substantive issues an external mock inspection surfaces. The investment in the mock is materially recovered in more efficient remediation, fewer remediation iterations, and better actual inspection performance. Quality leaders making the case for the mock inspection investment should be explicit about this economic argument: the mock is not overhead on top of remediation; it is the mechanism that produces better remediation in less total effort.

References & Sources

References & Sources

  1. Stakeholders’ Consultation on EudraLex Volume 4: Chapter 4, Annex 11 and New Annex 22 — European Commission Public Health. Official consultation page documenting the July 7 to October 7, 2025 consultation window for Annex 22 and companion revisions, and the basis for the preparation timeline.
  2. How to Prepare for Annex 22 — Rephine. Practitioner-grade guidance on the implementation timeline and the operational work sponsors should do during the preparation window.
  3. Preparing for EMA’s 2026 GMP Revisions: Chapter 4, Annex 11, Annex 22 — Epista. Industry analysis of the integrated preparation work required for the three companion revisions, including documentation expectations.
  4. Multistakeholder workshop on expert contributions to AI guidance development (Annex 22) — European Medicines Agency. Reference for the EMA’s stakeholder engagement process during Annex 22 development, including the structure of expected inspection expectations.
  5. ISPE GAMP Community of Practice — International Society for Pharmaceutical Engineering. ISPE’s GAMP work, including the GAMP Guide on Artificial Intelligence (July 2025), provides industry-side operational guidance complementary to Annex 22 that mock inspections should reflect.
  6. PDA/PQRI Workshop on Validation and Oversight of Artificial Intelligence — Parenteral Drug Association. Industry workshop documenting the validation and oversight patterns that align with Annex 22 expectations and that mock inspections should incorporate.
author avatar
Amie Harpe Founder and Principal Consultant
Amie Harpe is a strategic consultant, IT leader, and founder of Sakara Digital, with 20+ years of experience delivering global quality, compliance, and digital transformation initiatives across pharma, biotech, medical device, and consumer health. She specializes in GxP compliance, AI governance and adoption, document management systems (including Veeva QMS), program management, and operational optimization — with a proven track record of leading complex, high-impact initiatives (often with budgets exceeding $40M) and managing cross-functional, multicultural teams. Through Sakara Digital, Amie helps organizations navigate digital transformation with clarity, flexibility, and purpose, delivering senior-level fractional consulting directly to clients and through strategic partnerships with consulting firms and software providers. She currently serves as Strategic Partner to IntuitionLabs on GxP compliance and AI-enabled transformation for pharmaceutical and life sciences clients. Amie is also the founder of Peacefully Proven (peacefullyproven.com), a wellness brand focused on intentional, peaceful living.


Your perspective matters—join the conversation.

Discover more from Sakara Digital

Subscribe now to keep reading and get access to the full archive.

Continue reading