PHDC Episodes & TB Classification
Analysis & Algorithm Design
A comprehensive approach to designing phenotype algorithms for episodes in the Provincial Health Data Centre (PHDC), with diabetes mellitus as a worked example, plus analysis of TB outcomes, treatment statuses, and TB Treatment Action Lists (TTAL).
Luqmaan Mohamed
Scroll down to explore the analysis
Contents
Comprehensive analysis covering Question 1 (phenotype algorithms) and Question 2 (TB outcomes & patient classification)
PHDC & Episodes
Conceptual Overview
Phenotype Algorithm
7-Step Design Process
Diabetes Mellitus
Worked Example Algorithm
TB Outcomes & Statuses
Definitions & Value
TB Treatment Action List
Operational Impact
Patient Classification
Cases A–D Analysis
Assessment Coverage
- →PHDC context & episodes framework
- →7-step algorithm design process
- →Diabetes mellitus evidence table & scoring
- →Validation approach & key insights
- →Stakeholder consultation approach
- →Limitations & mitigation strategies
- →TB outcome categories (WHO standards)
- →TB treatment statuses (real-time)
- →Value proposition for WCG DoH&W
- →TB Treatment Action List (TTAL) purpose
- →Visual flow diagram of TB cascade
- →Patient A–D classification with explicit reasoning
PHDC & Episodes – High-Level Overview
The Provincial Health Data Centre (PHDC) consolidates person-level clinical data from multiple systems (clinic, hospital, lab, pharmacy, registers) to infer health conditions as 'episodes' over time. Episodes are then enriched into cascades to support clinical care, surveillance, and analytics.
- •Episodes represent health conditions inferred from multiple evidence types.
- •Chronic conditions (e.g. HIV, diabetes) usually have a single lifetime episode.
- •Acute conditions (e.g. TB, pneumonia) can have multiple episodes with defined start/end.
- •Evidence includes: lab tests, drug dispensings, admissions, diagnoses, procedures, and clinical activities.
- •Evidence is weighted by confidence to generate an overall score per patient per condition.
Multiple Data Sources
Lab, Pharmacy, Hospital, Clinic
PHDC Consolidation
Harmonise & Link Data
Episode Inference
Multiple Evidence → Confidence Score
Cascades & Outcomes
Enriched Clinical Insights
Clinical & Analytics Outputs
Reports, Alerts, Tools
Designing a Phenotype Algorithm for Episodes
A condition-agnostic 7-step process that can be applied to any health condition: HIV, diabetes, TB, and beyond.
Define the Condition & Episode
- →Clarify clinical definition
- →Decide if chronic vs acute
- →Define episode start/end
Map Available Data Sources
- →Identify relevant systems
- →Document coverage & frequency
- →Assess data quality
Co-Design Evidence with Stakeholders
- →Collaborate with clinicians & programme managers
- →Internal: data scientists, engineers
- →List candidate evidence items
Define Confidence Levels & Scoring
- →Categorise evidence by confidence
- →Assign numerical scores
- →Define high-certainty thresholds
Define Episode Logic
- →How to start an episode
- →How evidence maintains episode
- →When/how episode ends
Prototype & Validate
- →Implement in test environment
- →Generate line-lists & statistics
- →Chart reviews with clinicians
Governance & Iteration
- →Document algorithm & assumptions
- →Version control changes
- →Periodic clinical review
Data Sources Feeding the Episode Algorithm
Clinic & Hospital Systems
Clinicom, PHCIS, PREHMIS
- ✓Patient registrations & identifiers
- ✓Encounters (visits, admissions, discharges)
- ✓Diagnosis & procedure codes (ICD-10)
Laboratory Data
NHLS
- ✓Diagnostic & monitoring tests
- ✓Highly structured results
- ✓Date-stamped & standardised
Pharmacy Data
JAC, CDU
- ✓Medicines dispensed with ATC codes
- ✓Refill patterns & dates
- ✓Indicates ongoing chronic management
Disease Registers
Specialised Systems
- ✓Electronic TB/HIV registers
- ✓Chronic disease club lists
- ✓Programme-specific tracking
Mortality & Outcomes
Vital Registration
- ✓Dates of death
- ✓Cause-of-death information
- ✓Episode closure indicator
Community & Other
CHW Systems, mHealth
- ✓Community health worker activities
- ✓mHealth programmes
- ✓Outreach & engagement data
Evidence Confidence & Scoring
High Confidence
Strongly implies the condition by itself
Examples:
- • Repeated diabetes drugs (ATC A10) over time
- • 2+ HbA1c results ≥ 48 mmol/mol on different dates
Usage: Can create or maintain episode on its own
Weak–Moderate
Some indication, but may be noisy
Examples:
- • Single raised HbA1c above diagnostic threshold
- • Single ICD-10 diabetes diagnosis
Usage: Needs combination with other evidence
Supporting
Non-specific but increases confidence
Examples:
- • ACE inhibitors + statins with other DM evidence
- • Repeated capillary glucose during admission
Usage: Boosts overall score; not used alone
Negating
Suggests prior inference may be incorrect
Examples:
- • Explicit "no diabetes" note with normal tests
- • Evidence of remission after bariatric surgery
Usage: Subtracts from score; may close episode
Example: Diabetes Mellitus Phenotype Algorithm
Episode Definition
Condition:
Diabetes mellitus (Type 1 & Type 2)
Episode Type:
Chronic (one lifetime episode per person)
Episode Start:
First date when evidence score ≥90 OR first high-confidence evidence
Episode End:
Date of death OR explicit remission evidence
Evidence Items & Scoring
Pharmacy Evidence (JAC, CDU)
| Evidence Description | Confidence | Score | Notes |
|---|---|---|---|
| ≥2 dispensings of diabetes drug (ATC A10*) on separate dates within 12 months | High | 70 | Strong indicator of chronic diabetes - implies diagnosed + engaged in care |
| ≥1 insulin dispensing (A10A*) with repeat within 12 months | High | 80 | Very strong evidence - insulin highly specific to diabetes |
| Single dispensing of diabetes drug (A10*) | Weak–Moderate | 35 | Could be trial or misclassified |
| Concomitant statin + ACEI/ARB with other DM evidence | Supporting | 10 | Suggests cardiovascular risk management in diabetic patient |
Laboratory Evidence (NHLS)
| Evidence Description | Confidence | Score | Notes |
|---|---|---|---|
| ≥2 HbA1c results ≥ 48 mmol/mol (6.5%) on separate dates within 12 months | High | 70 | WHO/ADA diagnostic threshold repeated - meets clinical criteria |
| ≥2 fasting plasma glucose ≥ 7.0 mmol/L on different days within 6 months | High | 70 | WHO diagnostic criterion for diabetes - repeated confirmation |
| Single HbA1c ≥ 48 mmol/mol or random glucose ≥ 11.1 mmol/L | Weak–Moderate | 35 | Single elevated result - could be screening, stress, or illness |
| Borderline HbA1c (42–47 mmol/mol) with strong pharmacy evidence | Supporting | 10 | Pre-diabetes range or controlled diabetes; supports existing evidence |
Encounters & Diagnoses (Clinicom, PHCIS, PREHMIS)
| Evidence Description | Confidence | Score | Notes |
|---|---|---|---|
| ≥2 encounters with ICD-10 E10–E14 as primary diagnosis | Weak–Moderate | 35 | Diagnosis coding often incomplete in WC public sector (per PHDC paper) |
| ≥3 clinic visits as "diabetes clinic" or chronic club | Weak–Moderate | 35 | Indicates clinical engagement with DM care pathway |
| Single ICD-10 E10–E14 code | Supporting | 10 | Weak evidence alone due to incomplete coding practices |
Negating Evidence
| Evidence Description | Confidence | Score | Notes |
|---|---|---|---|
| ≥2 normal HbA1c (<42 mmol/mol) after previous DM inference, with no DM drugs for >12 months | Negating | -40 | May indicate misclassification or rare remission (e.g. post-bariatric surgery) |
| Explicit "diabetes excluded" or "no diabetes" note in discharge summary with corroborating normal labs | Negating | -50 | Clinical documentation contradicts DM inference |
Scoring & Thresholds
- →High-certainty episode: Score ≥90 in 12-month window OR single high-confidence evidence
- →Possible/low-certainty: Score 35–89 (used for analytics, not clinical tools)
- →No episode: Score < 35 OR strong negating evidence
Validation & Iteration Approach
The algorithm must be validated to ensure high-certainty episodes are clinically accurate and usable for both clinical tools and analytics.
Chart Review (Gold Standard Approximation)
Sample 100 patients flagged as "high-certainty DM" and 50 "low-certainty" across 3-5 different facilities (urban/rural mix)
Metric: Positive Predictive Value (PPV)
Target: >95% for high-certainty episodes; 70-85% for low-certainty
Register Comparison
Compare algorithm output to existing chronic disease registers, acknowledging neither is perfect gold standard
Metric: Sensitivity & Coverage
Target: Identify gaps: patients in algorithm but not registers (possible under-registration), and vice versa
Clinical Feedback Loop
Pilot algorithm outputs with clinicians at 3-5 facilities; gather feedback on false positives/negatives and actionability
Metric: Qualitative feedback + face validity
Target: Clinicians confirm >90% of high-certainty cases are true diabetics they recognize
Epidemiological Plausibility
Compare prevalence estimates to national surveys (SADHS), assess age/sex distributions vs expected patterns
Metric: Population-level concordance
Target: Prevalence within 10-15% of survey estimates; age distribution matches known epidemiology
Why Validation Matters
- ✓Clinical trust: Clinicians must trust algorithm outputs to use them in patient care decisions
- ✓Iterative improvement: Feedback reveals edge cases and data quality issues for refinement
- ✓Research validity: Documented PPV/sensitivity enables proper interpretation of epidemiological analyses
Key Insights from the Diabetes Algorithm Exercise
Practical learnings from designing a phenotype algorithm for the PHDC context
Multi-Source Triangulation Compensates for Data Gaps
PHDC's approach of combining pharmacy, lab, and encounter data overcomes known limitations in diagnosis coding quality (documented in the PHDC paper). No single source is perfect, but multiple weak signals converge to high-confidence inferences.
Pharmacy Data = Proxy for Clinical Engagement
Repeated medicine dispensing (especially insulin) scores highly because it reflects not just diagnosis but active patient engagement in care—a key indicator of chronic disease management in the SA public sector context.
Tiered Certainty Enables Dual Use
High-certainty episodes (score ≥90) are safe for clinical decision-support tools (e.g., alerts, patient lists). Low-certainty episodes (35-89) support epidemiological research where sensitivity matters more than specificity, enabling bias analysis.
Validation Builds Trust and Drives Iteration
Chart review and clinical feedback loops aren't just validation—they reveal edge cases (e.g., gestational diabetes misclassified as Type 2, diet-controlled patients with no pharmacy data) that improve the algorithm and earn clinician buy-in.
Context-Specific Weights Reflect Local Reality
Scoring must reflect WC public sector realities: incomplete coding, varying pharmacy digitization across facilities, and diagnostic pathways (e.g., random glucose used more than OGTT in primary care). Algorithm weights are not universal—they're calibrated to the data landscape.
Negating Evidence Prevents "Once Diabetic, Always Diabetic"
Including negating evidence (normal labs after previous inference, explicit clinical exclusion) allows the algorithm to self-correct and close false-positive episodes—critical for maintaining data quality and clinical credibility over time.
Questions for Stakeholders
Phenotype algorithm design is a consultative, collaborative process. Here are key questions I'd ask each stakeholder group to inform evidence selection, scoring, and validation.
Clinicians & Clinical Programme Managers
Do you trust random glucose ≥11.1 mmol/L as diagnostic for diabetes, or only fasting glucose?
Why ask: Random glucose is easier to collect in PHC settings (no fasting required), but may have lower specificity. Need to understand local diagnostic pathways.
What proportion of your diabetic patients are diet-controlled only, with no medication?
Why ask: These patients will not appear in pharmacy data. Helps quantify sensitivity gap and whether we need alternative evidence sources.
How do you currently identify diabetic patients who are lost to follow-up?
Why ask: Understanding existing workflows helps ensure PHDC outputs integrate with (rather than duplicate) current practices.
What would make you trust an algorithm-generated patient list enough to use it clinically?
Why ask: Uncovers concerns about false positives/negatives and desired confidence thresholds for actionability.
Data Scientists & Data Engineers
How reliable and complete is ATC coding in the pharmacy systems (JAC, CDU)?
Why ask: Determines whether we can trust A10* codes or need manual validation of drug lists.
What's the lag between service delivery and data availability in the PHDC?
Why ask: Impacts whether we can use the algorithm for real-time clinical alerts or only retrospective reporting.
How should we handle patients with multiple folder numbers (PMI duplicates)?
Why ask: Need technical approach: probabilistic linkage, manual review thresholds, or accept some duplication.
What's the compute cost of scoring all 8M patients daily vs weekly batches?
Why ask: Balances timeliness vs infrastructure costs; informs refresh frequency decisions.
Public Health / Epidemiology Teams
How does our algorithm-derived diabetes prevalence compare to SADHS survey estimates?
Why ask: Validates population-level plausibility; large discrepancies suggest systematic issues.
Should we distinguish Type 1 vs Type 2 diabetes, or is conflation acceptable?
Why ask: Affects algorithm complexity. Type 1 is rare and hard to identify from routine data; may not be worth the effort unless critical.
How do we want to handle gestational diabetes—separate phenotype or exclusion criterion?
Why ask: GDM has different clinical significance; need clear decision on whether to flag separately or exclude from general DM algorithm.
What's the acceptable positive predictive value (PPV) for research use vs clinical use?
Why ask: Research may tolerate 80% PPV for sensitivity; clinical tools need 95%+. Sets different thresholds for high/low-certainty episodes.
Facilities & Operational Managers
Which facilities have the lowest pharmacy digitization coverage?
Why ask: Identifies where algorithm will under-count; can flag these facilities for targeted data quality improvement.
How often do diagnosis codes get entered retrospectively vs at point-of-care?
Why ask: Affects whether we can rely on ICD-10 codes; retrospective coding is often less complete.
Would your staff use an algorithm-generated "diabetes patient list" for recall campaigns?
Why ask: Tests operational feasibility and user buy-in; reveals workflow integration barriers.
What's the current process when a patient transfers between facilities?
Why ask: Understanding transfer documentation helps assess risk of double-counting or loss-to-follow-up misclassification.
The Value of Asking Questions
Limitations & Challenges
Designing phenotype algorithms for the PHDC is not without challenges. Recognizing limitations upfront enables proactive mitigation and realistic expectations.
Data Quality & Completeness
Specific Challenges:
- ⚠Incomplete diagnosis coding: ICD-10 coding is often missing or inaccurate, limiting its utility as primary evidence
- ⚠Pharmacy data gaps: Not all facilities have digitized dispensing; diet-controlled diabetics have no pharmacy footprint
- ⚠Laboratory coverage: Some facilities lack consistent lab ordering; rural areas may have lower testing rates
Mitigation Strategy:
Use multi-source triangulation; validate against facility-level data quality metrics; set lower confidence for facilities with known gaps
Patient Master Index (PMI) Linkage
Specific Challenges:
- ⚠Duplicates: Same patient with multiple folder numbers due to registration errors
- ⚠Name variations: Nicknames, maiden names, spelling inconsistencies complicate matching
- ⚠Missing identifiers: Incomplete ID numbers or contact details hinder linkage accuracy
Mitigation Strategy:
Probabilistic linkage with fuzzy matching; manual review of high-volume duplicates; blacklist/whitelist for known errors (per PHDC paper)
Privacy & Governance
Specific Challenges:
- ⚠Balancing clinical utility vs patient protection: Named data needed for clinical tools but strict consent required for research
- ⚠Risk of re-identification: Even anonymized data with facility/date granularity can potentially identify individuals
- ⚠Consent fatigue: Patients may not understand or consent to secondary use of data
Mitigation Strategy:
Privacy-by-design architecture (separate patient/clinical DBs); tiered access controls; explicit patient information campaign with opt-out option
Algorithm Maintenance & Drift
Specific Challenges:
- ⚠Clinical practice evolution: Guideline changes (e.g., new HbA1c thresholds) require algorithm updates
- ⚠Source system changes: New systems, retired codes, or data structure changes break pipelines
- ⚠Performance decay: Algorithm accuracy may drift as population or care patterns change over time
Mitigation Strategy:
Version control for algorithms; automated data quality monitoring; annual validation with clinician review; feedback loops from users
Edge Cases & Misclassification
Specific Challenges:
- ⚠Gestational diabetes: May be misclassified as Type 2 if pharmacy/lab evidence overlaps with pregnancy window
- ⚠Steroid-induced hyperglycemia: Transient elevated glucose in hospitalized patients on corticosteroids
- ⚠Type 1 vs Type 2: Difficult to distinguish in routine data; age is a weak proxy
Mitigation Strategy:
Explicit rules for gestational DM (link to pregnancy episodes); exclude glucose during steroid Rx; accept Type 1/2 conflation unless critical for analysis
Resource & Capacity Constraints
Specific Challenges:
- ⚠Human capacity: Requires skilled data scientists/analysts—not traditionally in DoH staffing structures
- ⚠Compute resources: Large-scale daily processing of 8M patients requires significant infrastructure
- ⚠Stakeholder time: Clinicians and programme managers have limited availability for algorithm co-design
Mitigation Strategy:
Invest in data science hiring/training; leverage cloud/scalable infrastructure; use iterative co-design (short sprints, not long workshops)
Why Acknowledge Limitations?
TB Outcomes & Treatment Statuses
TB Outcome Categories
Retrospective classification of how a TB episode ended (final determination).
Cured
Bacteriologically confirmed TB with negative results in final month + prior occasion
Treatment Completed
Patient completed treatment per protocol but lacks bacteriological cure criteria
Treatment Failed
Regimen terminated due to lack of response, resistance, or side effects
Died
Patient died before or during TB treatment (any cause)
Lost to Follow-up
Did not start treatment OR treatment interrupted for ≥2 consecutive months
Not Evaluated
No treatment outcome assigned (includes transfers with unknown final outcome)
TB Treatment Statuses (Real-time)
Current status showing where each patient is in the treatment cascade today.
Never Started
Evidence of TB but no treatment start recorded
On TB Treatment
Treatment started with recent activity (e.g. visit or medicine pickup within 60 days)
LTFU Before Treatment
TB diagnosed but no treatment within 2 months of diagnosis
LTFU on Treatment
Treatment started but no activity for ≥2 months
Completed Treatment
Completed TB regimen as per guidelines
Died
Deceased after TB diagnosis (with or without treatment)
Transferred Out
Left Western Cape; outcome unknown to provincial service
TB Treatment Cascade: Patient Flow
TB Treatment Cascade Flow Diagram
Patient flow through TB diagnosis, treatment, and outcome stages (as of assessment date: 30 Aug 2025)
Status: LTFU Before Rx
Status: LTFU from Rx
Status: Completed
Other Possible Outcomes (Override Above Logic)
Legend & Key Assumptions
Why TB Outcomes & TTAL Matter for WCG DoH&W
TB Outcome Categories
- ✓Monitor programme performance (success rate, LTFU, death, failure)
- ✓Identify weakest points in the TB cascade
- ✓Support equity analysis by facility, district, age, HIV status
- ✓Enable research & evaluation of interventions
TB Treatment Action List (TTAL)
- ✓Daily/weekly updated actionable line list of patients needing follow-up
- ✓Ensures diagnosed patients are linked to and retained in treatment
- ✓Reduces LTFU through targeted tracing and outreach
- ✓Gives facility staff concrete workload: who to call, visit, or book
Patient-Level Classification – Cases A–D
Classification as of 30 August 2025 based on TB evidence, treatment dates, and activity timelines.
Patient A
Timeline
- •TB evidence: Multiple dates (May 2023, Jan 2024, Dec 2024, Aug 2025)
- • → Likely GeneXpert MTB detected + follow-up cultures
- •Treatment start: Multiple TB treatment dates across 2023-2025
- • → Standard 6-month regimen with documented dispensing
- •Treatment success: 1 Aug 2025 (rx_success_date)
- • → Bacteriological cure or completion documented
- •Date of death: 26 Aug 2025 (25 days after treatment success)
TB Outcome:
Treatment Success (Cured/Completed)
Current Status:
Died
Evidence Quality:
Strong: Multiple lab results + treatment + documented success
Reasoning: Completed TB treatment successfully with documented cure/completion. Death occurred after TB episode was closed as successful. Per WHO definitions, TB outcome is "cured/completed" even though patient subsequently died (possibly from other causes or advanced HIV/comorbidities).
Patient B
Timeline
- •TB evidence: 21 Apr 2021
- • → Possible GeneXpert, CXR suggestive, or clinical diagnosis
- •No recorded TB treatment start (no tb_treatment_date or phc_treatment_date)
- •No subsequent TB-related activity in 4+ years
- •No death or transfer out recorded (as of 30 Aug 2025)
TB Outcome:
Lost to Follow-up (Before Treatment)
Current Status:
LTFU Before Treatment
Evidence Quality:
Weak: Single old evidence point, no corroboration
Reasoning: TB evidence recorded >4 years ago, but no treatment initiation in available PHDC data. Well beyond the 60-day window for treatment start. Patient either: (1) never linked to care, (2) sought care outside WC public sector, or (3) data capture error. This is a high-priority case for TTAL-style tracing if patient is still contactable.
Patient C
Timeline
- •TB evidence: 23 Sep 2024
- • → Lab test or clinical diagnosis documented
- •PHC visit: 23 Sep 2024 (same day as evidence)
- • → Patient presented to PHC, likely counseled
- •No TB treatment start recorded (no tb_treatment_date or phc_treatment_date)
- •~11 months elapsed with no treatment initiation (diagnosis to 30 Aug 2025)
TB Outcome:
Lost to Follow-up (Before Treatment)
Current Status:
LTFU Before Treatment
Evidence Quality:
Moderate: Recent evidence + visit, but no follow-through
Reasoning: Patient had initial encounter with TB evidence and same-day PHC visit, suggesting diagnosis/counseling occurred. However, no treatment start documented in the subsequent 11 months. Possible reasons: patient declined treatment, was referred but never attended, or treatment started but not captured in PHDC. This represents a critical gap in the cascade and should trigger active patient tracing.
Patient D
Timeline
- •TB evidence: 12 Oct 2024
- • → Diagnostic test indicating TB (GeneXpert/culture/CXR)
- •PHC treatment start: 23 Jan 2025 (~3 months after evidence)
- • → Delay suggests patient initially LTFU, then re-engaged
- •Last recorded activity: 23 Mar 2025 (visit or dispensing)
- •~160 days (5+ months) with no activity or treatment success (to 30 Aug 2025)
- •No rx_success_date, no death, no transfer flags
TB Outcome:
Lost to Follow-up (On Treatment)
Current Status:
LTFU from Treatment
Evidence Quality:
Strong early, weak late: Treatment started but not maintained
Reasoning: Patient started TB treatment but last activity was >160 days ago with no documented treatment success. Standard TB regimen is 6 months; by Aug 2025, patient should have completed treatment (started Jan 2025). Absence of activity and no success flag indicates patient defaulted/interrupted treatment. High priority for contact tracing and re-engagement, especially to assess for drug resistance if treatment was incomplete.
Classification Assumptions
- •Episode exists if at least one TB evidence date is present
- •Treatment started if either TB treatment date or PHC treatment date is recorded
- •Last activity = latest date among treatment start, visit, or activity dates
- •LTFU from treatment if no activity for >60 days before assessment date
- •LTFU before treatment if evidence date >60 days ago with no treatment start
Sources & References
This assessment draws on clinical guidelines, PHDC documentation, and epidemiological literature to inform algorithm design and TB outcome analysis.
PHDC Architecture & Methods
Boulle A, Heekes A, Tiffin N, et al. Data Centre Profile: The Provincial Health Data Centre of the Western Cape Province, South Africa. International Journal of Population Data Science. 2019;4(2):06.
Diabetes Diagnostic Criteria
World Health Organization. Use of Glycated Haemoglobin (HbA1c) in the Diagnosis of Diabetes Mellitus. WHO/NMH/CHP/CPM/11.1. Geneva: WHO; 2011.
American Diabetes Association. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2024. Diabetes Care. 2024;47(Supplement 1):S20-S42.
SEMDSA Type 2 Diabetes Guidelines Expert Committee. SEMDSA 2017 Guidelines for the Management of Type 2 Diabetes Mellitus. Journal of Endocrinology, Metabolism and Diabetes of South Africa. 2017;22(1):S1-S196.
ATC Drug Classification
WHO Collaborating Centre for Drug Statistics Methodology. ATC/DDD Index 2024. Oslo, Norway.
TB Outcomes & Treatment Standards
World Health Organization. Definitions and Reporting Framework for Tuberculosis – 2013 Revision (Updated December 2014). WHO/HTM/TB/2013.2. Geneva: WHO; 2014.
National Department of Health, South Africa. National Tuberculosis Management Guidelines 2014. Pretoria: NDoH; 2014.
Epidemiological Context (South Africa)
Statistics South Africa. South African Demographic and Health Survey 2016. Pretoria: Stats SA; 2017.
Pillay-van Wyk V, Msemburi W, Laubscher R, et al. Mortality trends and differentials in South Africa from 1997 to 2012: second National Burden of Disease Study. Lancet Global Health. 2016;4(9):e642-53.
Methodology: Phenotype Algorithms & Inference
von Mering C, Jensen LJ, Snel B, et al. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Research. 2005;33:D433-7.
Richesson RL, Hammond WE, Nahm M, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. Journal of the American Medical Informatics Association. 2013;20(e2):e226-31.
Note on Source Tracking
As per the assessment instructions, I've kept track of sources and reasoning throughout this analysis. The diabetes algorithm scoring reflects WHO/ADA diagnostic criteria, PHDC-documented data quality issues (e.g., incomplete coding), and local context (SA public sector pharmacy patterns). TB outcome categories follow WHO standard definitions as applied in South African national guidelines. All assumptions and rationale are explicitly documented in the evidence tables and patient classification logic.
Closing Summary
Structured phenotype algorithms allow PHDC to infer conditions and episodes from imperfect, multi-source data, transforming routine clinical and administrative records into actionable health insights.
Confidence-weighted evidence and clear episode rules make outputs useful for both clinical decision-support tools and population-level epidemiological analytics, enabling different levels of certainty.
Well-defined TB outcomes and TTAL line lists translate data into real-world actions that can materially improve patient care, programme performance, and population health outcomes.
Luqmaan Mohamed
Business Analyst Technical Assessment