
Most digital health implementers I know can recite these headline statistics in their sleep.
The pitch decks practically write themselves: AI is the force multiplier that saves global health from austerity. I am increasingly worried that we are mistaking the pitch deck for a plan.
A new Lancet Primary Care article lays out a five-question test that any AI tool aimed at primary health care should pass before it scales up. The authors are not anti-AI. They want this technology to work. And their five questions are implementation design gold.
Here are their questions and what ethical implementers should do about each one.
1. Does the tool relieve the clinician’s hardest moment?
The authors describe a rural nurse seeing more than 70 patients a day, making high-risk calls with thin diagnostics and incomplete records. If your AI tool helps that nurse predict something interesting but does not change what she does next, you have built a workload generator.
What ethical design looks like:
- Co-design with frontline workers from day one, not after the prototype.
- Map the three most cognitively expensive moments in a typical shift (danger sign assessment, drug interaction checks, referral decisions) and build for those.
- Kill any feature that produces a number without producing an action.
Concrete test: If you removed the AI output from the screen, would the worker still know what to do next? If yes, your AI is decoration.
2. Does the system make invisible patients visible?
This is the hardest one, and the one most pilots quietly fail. Patients who move between facilities, miss follow-ups, or use multiple entry points are clinically invisible to fragmented record systems. Train an AI on those records and it inherits the gaps. Worse, it launders them as objectivity.
What ethical design looks like:
- Invest in interoperable identifiers and minimum datasets before the model.
- Build paper-to-digital bridges, because paper is not going away in most rural facilities this decade.
- Validate model performance separately for mobile populations, women, and linguistic minorities, and publish the disaggregated results.
- If the gender-disaggregated AUC is worse, say so on the product page.
This connects to a problem we wrote about before: most chatbot pilots treat operational integration as an afterthought when it should be the primary design constraint. Visibility is operational integration. It is not a feature.
3. Does the evidence reflect real clinical conditions?
Benchmark accuracy is a core health AI marketing angle. It is also nearly useless for predicting whether a tool will work in a clinic with intermittent power, no pulse oximeter, and one nurse covering three villages. The authors are blunt: a model can improve its score and simultaneously increase consultation time and generate inappropriate referrals.
What ethical design looks like:
- Pragmatic evaluation under routine conditions, not lab benchmarks.
- Report workflow outcomes (consultation time, referral rates, appropriate antibiotic prescribing) alongside accuracy.
- Adopt CONSORT-AI reporting standards as the floor, not the ceiling.
- Disaggregate safety findings by geography, language, and gender.
For donors: stop accepting pilot reports that lead with model accuracy. Ask for the workflow data. If the implementer cannot produce it, that is the answer.
4. Can decisions be understood, reviewed, and corrected?
When something goes wrong at the point of care, the frontline worker is the one holding the bag. That does not change because there is a model in the loop. So the question is whether your tool gives that worker the ability to see why a recommendation was made, override it, and trigger review.
What ethical design looks like:
- Clear intended-use statements written for clinicians, not lawyers.
- Documented performance bounds.
- Audit trails that survive staff turnover.
- UI that communicates uncertainty in plain language (“low confidence, consider referral”) rather than false precision.
- A working feedback channel that closes the loop with developers, not a buried “report issue” link.
The WHO regulatory considerations on AI for health are the baseline here. I would argue they are also the ceiling of what most LMIC regulators currently have capacity to enforce, which is precisely why implementers need to internalize these standards rather than wait to be audited.
5. Does the tool protect the workforce, or expose it?
This is the question I find most often missing from implementer pitches. Introducing AI without training, supervision, and defined accountability shifts liability onto clinicians who are already at the breaking point. It also creates moral distress when a worker knows the tool is wrong but has no realistic alternative.
What ethical design looks like:
- A workforce strategy that is funded as a line item, not an afterthought.
- Clinical and digital literacy training that continues past launch.
- Defined supervision structures.
- An explicit safety role at the implementing organization to monitor errors, bias, and model drift.
- Liability arrangements that do not default to “the nurse should have known better.”
If your budget for change management and training is less than your budget for software licenses, you do not have a workforce strategy. You have a procurement.
Next steps for digital health designers
I have argued before that refusing to deploy AI in LMIC healthcare carries its own ethical cost. I still believe that. The shortage of clinicians is real, the deaths from poor-quality care are real, and waiting for perfect tools is itself a choice with a body count.
But the urgency argument has been weaponized. “People are dying” has become a license to scale tools that have not earned it. These five questions are a useful corrective because they force a different conversation:
- How this specific tool, in this specific workflow, with this specific workforce, has been designed to help without harming them?
Equity by design is not a slogan. It is a procurement specification. We should start treating it like one.
