The reality? AI models trained on biased data don’t just reflect inequality – they operationalise it at scale.
How Do We Solve the Equity Crisis in Clinical Trials Using AI
September 3, 2025
Artificial intelligence (AI) is revolutionising healthcare. From analysing diagnostic scans to streamlining patient recruitment, AI promises faster, smarter, more scalable decision-making. But what happens when these tools inherit, or even amplify, the systemic biases already embedded in medicine?
In a space where health equity should be non-negotiable, AI is quietly creating a new kind of exclusion: one coded in the algorithms themselves.
From Data to Discrimination
It’s a well-documented fact that machine learning systems trained on historical data often perpetuate the very inequalities they aim to solve. A 2019 Science study uncovered that an AI tool used for managing population health significantly underestimated the medical needs of Black patients. Why? Because the model used healthcare spending as a proxy for health needs, and spending is deeply entangled with systemic access issues. Less money is spent on Black patients who have the same level of need, and the algorithm thus falsely concludes that Black patients are healthier than equally sick White patients.
But less care doesn’t equal better health. It often means barriers were present all along.
Fast forward to 2025, a Nature Medicine study exposed how LLM-based healthcare systems reflect real-world biases across race, income, and geography. In the study, AI triage tools responded to the same clinical scenarios differently, simply based on race, income level, or housing status. The results were alarming:
- Black, LGBTQIA+ and unhoused patients were disproportionately steered toward mental health evaluations.
- Wealthier patients received recommendations for advanced diagnostics such as CT or MRI scans, while low-income patients were offered little or no further testing.
The Downstream Impact: Who Gets Recruited?
This bias isn’t just dangerous in diagnostics — it has serious consequences for clinical research.
AI tools are increasingly used to screen and select patients for trials . If the underlying models favour certain geographies, socioeconomic profiles, or majority ethnic groups, we risk building homogeneous trial populations. That’s not just an ethical problem — it’s a scientific one.
Drugs tested predominantly on white, urban, middle-income patients may perform differently across diverse populations. Lack of diversity leads to:
- Inaccurate efficacy data
- Missed side effects in underrepresented groups
- Slower diagnosis and poorer outcomes for minority patients post-approval
A stark example of algorithmic harm lies beyond trials, in the allocation of life-saving, resource-limited healthcare interventions such as organ transplants. Much like AI used in trial screening, these systems rely on predictive models to decide who gets access. When those models are flawed, the consequences can be fatal.
Case in point: introduced in 2018, the UK’s Transplant Benefit Score (TBS) is an algorithm that helps allocate deceased donor livers to patients with chronic liver disease or primary liver cancer.
It calculates a patient’s score using:
- Predicted survival with a transplant (utility)
- Predicted survival without a transplant (need)
- A prioritisation formula: TBS = utility – need
The goal was to allocate organs to those most likely to benefit — but within three years, serious problems emerged. According to a 2023 Lancet analysis:
- The model implausibly predicted longer survival for chronic liver disease patients if they also had cancer
- This led to cancer patients being deprioritised, rarely receiving transplants
- As a result, death and deterioration rates increased for patients with liver cancer
This case is a cautionary parallel for AI-based trial recruitment: without thorough bias analysis and oversight, AI models can entrench disparities, skew enrolment, and produce results that don’t reflect real-world patient diversity. As with TBS, the danger isn’t malevolent intent — it’s overconfidence in under-tested algorithms making decisions that deeply affect patients’ lives.
What Does Ethical AI Look Like in Clinical Trials?
While much of the conversation around AI in medicine focuses on risk, there are companies actively designing systems to reduce bias, increase transparency, and make trials more inclusive at their core.
- Unlearn.AI uses digital twin technology to forecast individual patient outcomes based on real-world baseline data. These AI-generated twins help improve trial efficiency by enabling statistical adjustments that reduce reliance on large control groups. Trained on diverse historical datasets and validated through regulatory pathways, their models aim to enhance inclusivity and generalisability without compromising scientific rigour.
- OWKIN leverages federated learning, a privacy-preserving technique that trains AI models on data stored across hospitals and research centres — without moving or centralising the data itself. By uncovering predictive biomarkers and treatment-response patterns from this diverse, decentralised data, they help researchers design smarter, more inclusive clinical trials without compromising patient privacy or regulatory compliance.
- Deep 6 AI analyses both structured and unstructured electronic health record data — from clinical notes to imaging and lab results — to match patients to trial inclusion and exclusion criteria. This richer data approach surfaces eligible participants often overlooked by traditional systems, improving both speed and diversity in recruitment.
What Needs to Happen Next?
As AI becomes more embedded in healthcare decision-making, from diagnostics to trial recruitment, we must recognise when it helps, and when it harms. Not every problem is an AI problem.
Before deploying a model, ask:
- Is there enough high-quality, representative data to support reliable predictions across patient groups?
- Will the tool amplify outdated clinical assumptions or diminish critical human judgment?
- Could deploying this tool unintentionally exclude the very patients we need to understand better?
These are not hypotheticals. In many AI tools already in use, inclusion remains an afterthought — or entirely absent. This disconnect between what we can do and what we should do is what experts call the AI chasm: the growing gap between rapid model development and meaningful, evidence-based integration into healthcare systems.
To bridge that gap, and build tools that earn both trust and utility, we need stronger safeguards and shared responsibility across developers, clinicians, and regulators. That includes:
- Regular auditing of AI tools for bias, not just performance metrics
- Mandated dataset diversity spanning race, gender identity, income, and geography
- Involvement of patient advocates and equity specialists in the model development lifecycle
- Transparent, explainable outputs, rather than inscrutable black-box logic
- Clear red lines where human judgment must take precedence: some decisions should neverbe fully automated, particularly those with life-altering consequences
On the regulatory front, frameworks like CONSORT-AI and SPIRIT-AI have begun adapting clinical trial reporting and design standards for studies involving AI. These guidelines are actively evolving to address the nuanced risks and ethical demands of AI-enabled interventions. But adoption is uneven, and enforcement remains limited.
Both companies and policymakers have a role to play in closing the AI chasm. Only by centring ethics, transparency, and real-world equity in model design can we ensure that AI in healthcare doesn’t just accelerate progress — but expands it.
Asking the Hard Questions
Before we implement any AI in clinical research, we must stop and ask:
- Who built the model?
- Whose data was used?
- Who audits the outputs — and how often?
- What happens when it fails?
Technology doesn’t solve bias on its own. It reflects the values — or blind spots — of those who create it.
Let’s not allow AI to become the new gatekeeper in medicine. Let’s design systems that open doors, not close them.
Are you working on inclusive AI in clinical research? Which tools have you seen move the needle in a positive direction?