Schedule a Call

Real-World Data Quality: From ICD-10 Drift to Useable Cohorts

Executive Summary

Real-world data programs in pharma are exceptionally vulnerable to a quiet failure mode: cohorts that pass quality checks on the day they are built become unreliable over time as ICD-10-CM codes are added, deleted, or reclassified through annual updates. The phenomenon is well-documented in the OHDSI community and in observational research methodology literature, and it is consequential enough that quality leaders treating cohort construction as a one-time activity will inevitably produce evidence that does not hold up under regulatory or internal scrutiny.

This article translates the academic and OHDSI literature on RWD quality into operational guidance for pharma sponsors. We cover why cohorts drift, the mechanics of ICD-10-CM updates that drive the drift, the three data quality dimensions that anchor cohort reliability (conformance, completeness, plausibility), the OHDSI tooling that operationalizes the discipline, cohort design patterns that survive code updates, and the monitoring program that keeps cohort definitions defensible across years of longitudinal study.

72% of cohort definitions reviewed in a recent methodological study contained missing or incomplete codes, with only 21% using exact codes. The variation in cohort code use is a structural problem in observational research that materially affects who ends up in the cohort and who is excluded.1

Why Cohorts Drift in the First Place

A pharma RWE team builds a cohort definition in 2023 for a chronic disease study. The definition uses an ICD-10-CM code set, prescription criteria, and a small number of inclusion and exclusion rules. The cohort passes initial quality checks. The study runs. Two years later, when the team revisits the cohort for a follow-up analysis, they find that something has shifted: patient counts in the most recent year are inconsistent with prior years, certain demographic patterns have changed, and a careful review surfaces three causes — none of which were anticipated when the cohort was first built.

This pattern is recognizable to anyone who has worked with longitudinal observational data. The drift is not caused by the underlying patient population changing. It is caused by the upstream coding ecosystem changing while the cohort definition stays static. The three principal causes:

Annual ICD-10-CM code updates. The Centers for Medicare and Medicaid Services (CMS) and the National Center for Health Statistics (NCHS) maintain ICD-10-CM, and each fiscal year sees additions, deletions, and reclassifications. The FY 2026 update, summarized in the HIAcode FY 2026 ICD-10-CM update analysis, included nearly twice as many new codes as the prior year, with additions to a single chapter ranging from 1 to 213. A cohort defined against the FY 2023 code set will, over time, see patients who clinically belong in the cohort coded against new ICD-10-CM codes that the original definition does not capture.

Mapping convention drift. Where the cohort is built on a common data model such as OMOP, the mapping from source ICD-10-CM codes to OMOP standard concepts depends on the vocabulary version. As vocabulary versions update, the mapping conventions can shift, particularly for codes near the edges of clinical concepts. The Book of OHDSI’s data quality chapter documents this phenomenon and provides tooling — including the checkCohortSourceCodes function — designed specifically to detect it.

Source system practice changes. Even when codes and mappings stay constant, the practices of the source systems generating the data can shift. A hospital system that introduces a new EMR template, a payer that adjusts its claims adjudication rules, or a clinician group that changes its documentation practices can all produce code-frequency shifts that look like cohort drift but are actually upstream practice change.

The combined effect is that cohorts in longitudinal RWE programs are not stable objects. They require active maintenance, monitoring, and a documented discipline for how the definition is updated as the coding ecosystem evolves. Programs that treat cohort definitions as static produce evidence that systematically degrades in quality over time, and the degradation is invisible without explicit monitoring.

The ICD-10 Update Mechanics That Matter

The annual ICD-10-CM update cycle follows a recognizable rhythm. Updates are released on October 1 of each year and apply to the federal fiscal year beginning that date. The 2026 cycle, as documented by the UASI FY 2026 update summary, brought 487 new codes, 38 deletions, and 50 revisions, with significant activity in obesity classification, hypoglycemia specificity, eating disorders, lymphoma staging, and psoriasis subtypes.

Mid-year updates also occur. The April 1, 2026 update introduced new codes for specific clinical conditions that emerged during the fiscal year, as detailed in the HIAcode April 2026 update analysis. The mid-year cadence means that even sponsors who synchronize their cohort definitions annually face quarterly windows where the source coding can shift in ways the definition does not anticipate.

Three types of code change have different implications for cohort stability:

Change TypeMechanismCohort Impact
New code additionA new code is added for a clinical condition not previously distinctly codedPatients clinically eligible for the cohort may be coded against the new code and excluded by the static definition
Code deletionAn obsolete code is removed; patients formerly coded against it are now coded against alternativesThe cohort definition includes a code that no longer appears in recent data, creating an artificial dropoff
Code revision or reclassificationThe clinical scope of an existing code is narrowed, expanded, or shiftedThe cohort’s clinical interpretation drifts even when the code set looks unchanged
Granularity expansionA general code is replaced by multiple more specific codesCoding practice migrates toward the more specific codes, gradually orphaning the original general code
Chapter restructuringCodes are moved between chapters or reorganizedDefinitions built on chapter-level criteria silently change scope

The granularity expansion pattern is particularly insidious because the original code does not disappear immediately; it remains available but progressively less used as clinicians and coding staff adopt the more specific alternatives. A cohort built on the original code can lose its effective sensitivity over a multi-year window without any single update causing a visible break.

The Three Data Quality Dimensions That Anchor Cohort Reliability

The OHDSI community has articulated three data quality dimensions that, together, anchor cohort reliability. As described in the 2021 article in Increasing Trust in Real-World Evidence Through Evaluation of Observational Data Quality (Blacketer et al.), the dimensions are conformance, completeness, and plausibility, and each anchors a specific category of quality concern.

Conformance measures how well the data conform to the structural and semantic expectations of the data model. For a cohort built on OMOP, conformance includes checks such as: are condition occurrence dates within the patient’s observation window, are values within their permitted ranges, do foreign keys resolve, and are required fields populated. Conformance failures typically indicate ingestion or transformation bugs rather than clinical issues, and they are the most automatable of the three dimensions.

Completeness measures whether the expected data are present. For a cohort, completeness includes checks such as: are condition records present for the periods and populations expected, are prescription records present where expected, are laboratory values present at the expected frequencies. Completeness failures may indicate source system issues, data loss in transformation, or genuine population characteristics — and distinguishing between these is part of the quality discipline.

Plausibility measures whether the data make clinical and statistical sense. For a cohort, plausibility includes checks such as: are age distributions consistent with the disease, are prescription patterns consistent with treatment guidelines, are co-morbidity patterns consistent with clinical expectations. Plausibility failures often indicate mapping issues, source data quality issues, or genuine population shifts — and again, the discipline is in distinguishing.

The three dimensions are complementary rather than independent. A code that has been deleted and is no longer used produces a completeness failure (the records that should be present in recent data are missing) that becomes visible as a plausibility failure (the cohort’s recent prevalence drops in ways inconsistent with clinical expectations). Programs that monitor all three dimensions simultaneously catch drift earlier than programs that monitor only one.

The OHDSI Tooling That Operationalizes the Discipline

The OHDSI community has built tooling that operationalizes the three-dimension framework, and the tooling has become the de facto standard for OMOP-based RWE programs. The principal components:

Data Quality Dashboard (DQD). An open-source R package that systematically executes more than 3,300 configurable data quality checks against an OMOP CDM instance. DQD covers conformance, completeness, and plausibility, and produces a report that quality teams can review to identify specific issues. As described in the OHDSI stack implementation guide, DQD is typically run as part of the regular refresh cycle, with results reviewed by data engineering and clinical analytics teams.

Achilles. Generates descriptive statistics about an OMOP CDM instance — concept frequencies, demographic distributions, condition prevalences. Achilles output is used by both DQD and the ARES quality reporting tool to characterize the data. For cohort drift detection, Achilles output over time provides the temporal signal that surfaces shifting code frequencies before they invalidate downstream cohorts.

MethodEvaluation R package. Includes the checkCohortSourceCodes function, which takes a cohort definition as input and identifies which source codes map to the concepts in each concept set. The function computes the prevalence of these codes over time, surfacing temporal issues associated with specific source codes. This is the tool most directly relevant to ICD-10 drift detection.

ATLAS and WebAPI. The cohort design and execution platform that most OHDSI users interact with. ATLAS allows cohort definitions to be authored, executed, characterized, and shared across collaborating organizations. Cohort definitions in ATLAS are versioned, which makes the temporal discipline of cohort updates explicit.

The tooling does not eliminate the need for clinical and methodological judgment. It surfaces signals; quality teams interpret them. A code-frequency drop in DQD output could indicate a deleted code, a source system practice change, or a genuine population shift. The tooling tells the team where to look; the team determines what the finding means.

Cohort Design Patterns That Survive Code Updates

The most operationally important question for pharma RWE teams is how to design cohort definitions that survive code updates without requiring constant re-engineering. Several patterns have emerged from the OHDSI community and observational research methodology literature.

Use standard concept sets rather than direct ICD-10-CM codes. When a cohort is built on an OMOP CDM, the definition should reference standard concepts (SNOMED CT for conditions, in OMOP’s vocabulary architecture) rather than source ICD-10-CM codes directly. The OMOP vocabulary maintainers maintain the mappings from source codes to standard concepts, including for new and revised codes. A cohort built on standard concepts inherits the vocabulary maintenance, while a cohort built on source codes requires manual update.

Include concept descendants. OMOP concept sets allow inclusion of descendants, which captures more specific codes as they are added through granularity expansion. A cohort defined to include “Type 2 diabetes mellitus and descendants” automatically captures more specific Type 2 diabetes codes added in future updates.

Document the clinical intent separately from the code set. Cohort documentation should articulate the clinical intent (what condition, what population, what clinical reasoning) separately from the specific code implementation. This makes it possible to evaluate whether a future code update warrants a change to the code set without requiring archaeological reconstruction of the original intent.

Maintain a code drift register. A documented register of ICD-10-CM code changes that affect active cohort definitions, with the date of the change, the affected cohort, the assessment of impact, and the resolution. The register provides the institutional memory that prevents drift from accumulating invisibly.

Use phenotype validation. For high-stakes cohorts, validate the cohort by sampling and clinically adjudicating a subset of patients. Phenotype validation is more expensive than algorithmic validation but produces evidence that holds up under regulatory scrutiny in ways that purely algorithmic cohort definitions do not.

Sakara Digital perspective: The single most leverage-producing decision in RWD cohort design is whether to build on standard concepts or on source codes. Programs that build on source codes for “control” or “reproducibility” reasons consistently find themselves in expensive remediation work after the second or third annual code update. Programs that build on standard concepts, accept the OMOP vocabulary as the abstraction layer, and document clinical intent separately produce cohorts that age materially better and require less continuous maintenance effort.

Building a Cohort Monitoring Program

The design patterns reduce drift risk, but they do not eliminate it. A monitoring program is what catches the residual drift that the design patterns miss. A workable cohort monitoring program has five components.

Scheduled DQD runs. The Data Quality Dashboard should be executed on a defined cadence — typically aligned with the data refresh cycle, often monthly or quarterly — and the results reviewed by data engineering and clinical analytics. Results should be tracked over time, not just reviewed in isolation, so trends become visible.

Cohort prevalence tracking. Each active cohort should have its prevalence tracked over time, with expected variation defined and flagged when prevalence falls outside expected bounds. A prevalence drop of 20% in a single quarter, for a cohort that historically varied by 2-5%, is a signal worth investigating before downstream analyses depend on the recent data.

Code-frequency monitoring. For the source codes underlying each cohort’s concept set, frequencies should be tracked over time. Sudden frequency changes — particularly drops — typically indicate either source system practice changes or upstream coding changes that the concept set vocabulary has not yet been updated to reflect.

Annual ICD-10-CM update review. Each annual ICD-10-CM update should be reviewed for impact on active cohort definitions. The review produces a documented assessment for each affected cohort, with the decision (no change required, monitor for impact, update the definition, full re-engineering) and the rationale.

Phenotype validation refresh. For cohorts with active downstream analyses, periodic phenotype validation refresh provides ongoing evidence that the cohort definition continues to capture the intended clinical population. The cadence varies by cohort criticality; annual refresh is a reasonable default for high-stakes cohorts.

The monitoring program produces signals; the response procedures produce decisions. Signals without response procedures accumulate as background noise that quality teams learn to ignore. The discipline of defined response procedures — when to escalate, who decides, what documentation is produced — is what makes the monitoring program operationally effective.

Governance, Documentation, and Regulatory Defensibility

The methodological work on cohort drift connects to a broader governance question: how does a sponsor demonstrate to a regulator that the cohort underlying a regulatory submission is reliable? The recent emergence of AI-powered cohorting platforms, discussed in MedCity News coverage of AI-powered cohorting in RWE, raises the methodological stakes further: AI-driven cohort selection introduces additional sources of variability that traditional rule-based cohorts do not have.

For pharma sponsors, the governance package that supports regulatory defensibility includes:

  • Documented cohort definition with clinical intent, code implementation, vocabulary version, and date of definition
  • Validation evidence from phenotype validation, including the sampling methodology, adjudication results, and positive predictive value estimates
  • Drift register documenting code changes that have affected the cohort and the assessment and response for each
  • Monitoring output from DQD and Achilles, with the trend lines that demonstrate ongoing conformance, completeness, and plausibility
  • Decision documentation for cohort updates, including the rationale and the impact assessment
  • Reproducibility evidence demonstrating that the cohort, executed against the same data and the same vocabulary version, produces consistent results

The CDISC organization’s work on real-world data standards, summarized in the CDISC RWD Connect Qualitative Delphi Survey Report, signals where the regulatory expectations are heading: toward standardized representations of cohort definitions and quality evidence that allow regulators to evaluate cohort reliability through documented disciplines rather than through case-by-case forensic review.

Where AI-powered cohorting changes the picture

The emergence of AI-powered cohorting platforms is materially changing the methodological landscape. Traditional rule-based cohorts have the property that the cohort is the rule: given the same rule, the same data, and the same vocabulary, the cohort is deterministic. AI-powered cohorts, by contrast, may produce different memberships for the same input depending on model state, training data freshness, and prompt context. This is a categorically different methodological situation, and regulatory frameworks have not yet caught up to it.

Sponsors deploying AI-powered cohorting in regulatory-adjacent work should anticipate that the documentation burden is materially higher than for rule-based cohorts. Model versioning, prompt versioning, training data lineage, and reproducibility evidence are all required. The FDA’s January 2025 draft guidance on AI for regulatory decision-making, with its credibility framework, provides the most relevant regulatory anchor; sponsors should map their AI-cohort documentation to the credibility framework’s seven steps.

The skills bottleneck for cohort quality work

A practical implementation point that is often underweighted: the QA capacity for cohort quality work requires staff who understand both clinical epidemiology and the OMOP/OHDSI tooling. This is a narrow skill set, and the talent market is not producing it at scale. Programs that assume they can train QA staff into the role in 90 days consistently find that meaningful expertise takes 12-18 months to develop. The implication is that programs should invest in cohort quality capacity development well before they need it, and should consider external partnerships with specialized consultancies for the first major cohort program.

The relationship between cohort quality and study reproducibility

Cohort drift is one of several factors that determine whether observational studies are reproducible across institutions and across time. The OHDSI community’s network research model — where studies are executed in parallel across multiple institutions running OMOP CDM — depends on cohort definitions that can be reliably executed in different contexts. Sponsors building cohorts for network studies face an additional constraint: the cohort must perform consistently across institutions that may have different upstream coding practices, different vocabulary versions, and different source systems. Building for network reproducibility from the start is materially less expensive than retrofitting reproducibility after the fact.

The strategic implication is that RWD cohort quality is not a technical detail to be handled by data engineering. It is a methodological discipline that affects regulatory submissions, network research participation, and the scientific defensibility of the evidence sponsors produce. Quality leaders treating it accordingly produce better evidence and face less regulatory friction than leaders treating it as background infrastructure.

References & Sources

References & Sources

  1. Variations in Using Diagnosis Codes for Defining Age-Related Macular Degeneration Cohorts — PubMed Central / Ophthalmology Science. Source for the 72% incomplete-code and 21% exact-code statistics on cohort definition variation in observational research.
  2. The Book of OHDSI — Chapter 15: Data Quality — OHDSI Community. Authoritative reference for the conformance/completeness/plausibility framework and the checkCohortSourceCodes function.
  3. Increasing Trust in Real-World Evidence Through Evaluation of Observational Data Quality — Blacketer et al., JAMIA / PubMed Central (2021). The article that articulated the three-dimension data quality framework for OMOP-based observational research.
  4. FY 2026 ICD-10-CM Code Updates — HIAcode. Practitioner summary of the FY 2026 annual update mechanics that drive cohort drift in pharma RWE programs.
  5. OHDSI Stack Implementation Guide: Achilles, DQD, WebAPI, Atlas, and ARES — The Build Log. Operational reference for deploying the OHDSI quality tooling stack against an OMOP CDM instance.
  6. AI-Powered Cohorting Is Quietly Reshaping How Real-World Evidence Gets Built — MedCity News (April 2026). Industry-level view of how AI-driven cohort selection is changing the methodological landscape for RWE.
author avatar
Amie Harpe Founder and Principal Consultant
Amie Harpe is a strategic consultant, IT leader, and founder of Sakara Digital, with 20+ years of experience delivering global quality, compliance, and digital transformation initiatives across pharma, biotech, medical device, and consumer health. She specializes in GxP compliance, AI governance and adoption, document management systems (including Veeva QMS), program management, and operational optimization — with a proven track record of leading complex, high-impact initiatives (often with budgets exceeding $40M) and managing cross-functional, multicultural teams. Through Sakara Digital, Amie helps organizations navigate digital transformation with clarity, flexibility, and purpose, delivering senior-level fractional consulting directly to clients and through strategic partnerships with consulting firms and software providers. She currently serves as Strategic Partner to IntuitionLabs on GxP compliance and AI-enabled transformation for pharmaceutical and life sciences clients. Amie is also the founder of Peacefully Proven (peacefullyproven.com), a wellness brand focused on intentional, peaceful living.


Your perspective matters—join the conversation.

Discover more from Sakara Digital

Subscribe now to keep reading and get access to the full archive.

Continue reading