recommendations and a self-assessment checklist to enable AI designers, developers, evaluators
and regulators to develop trustworthy and ethical AI solutions in medicine and healthcare.
Risk identification through comprehensive, multi-faceted clinical evaluation of AI solutions
While identifying and mitigating risks in medical AI by means of adequate evaluation studies is
crucial, existing scientific literature focused mostly on evaluating model accuracy and robustness of
the AI tools in laboratory settings. Other aspects of medical AI, such as clinical safety and
effectiveness, fairness and non-discrimination, transparency and traceability, as well as privacy and
security, are more challenging to evaluate in controlled environments and have thus received far
less attention in scientific literature.
There is a need for a more holistic, multi-faceted evaluation approach for future AI solutions in
healthcare. Best practices to enhance clinical evaluation and deployment include: (i) employing
standard definitions of clinical tasks (e.g. disease definition) to enable objective community-driven
evaluations; (ii) defining performance elements beyond accuracy, such as for fairness, usability,
explainability and transparency; (iii) subdividing the evaluation process into stages of increasing
complexity (i.e. to assess feasibility, then capability, effectiveness and durability); (iv) promoting
external evaluations by independent third-party evaluators; and (v) employing standardised
guidelines for reporting the AI evaluation results to increase reproducibility, transparency and trust.
Policy options
- Extend AI regulatory frameworks and codes of practice to address healthcare-specific risks
and requirements
In order to tailor existing frameworks and AI practices specifically to the medical field, multi-faceted
risk assessment should be an integral part of the medical AI development and certification process.
Furthermore, risk assessment must be domain-specific, as the clinical and ethical risks differ in
different medical fields (e.g. radiology or paediatrics). In the future regulatory framework, the
validation of medical AI technologies should be harmonised and strengthened to assess and
identify multi-faceted risks and limitations by evaluating not only model accuracy and robustness
but also algorithmic fairness, clinical safety, clinical acceptance, transparency and traceability. - Promote multi-stakeholder engagement and co-creation throughout the whole lifecycle of
medical AI algorithms
For the future acceptability and implementation of medical AI tools in the real world, many
stakeholders beyond AI developers – such as clinicians, patients, social scientists, healthcare
managers and AI regulators – will play an integral role. Hence, new approaches are needed to
promote inclusive, multi-stakeholder engagement in medical AI and ensure the AI tools are
designed, validated and implemented in full alignment with the diversity of real-world needs and
contexts. Future AI algorithms should therefore be developed by AI manufacturers based on cocreation, i.e. through strong and continuous collaboration between AI developers and clinical endusers, as well as with other relevant experts such as biomedical ethicists.
Integrating human- and user-centred approaches throughout the whole AI development process
will enable the design of AI algorithms that better reflect the needs and cultures of healthcare
workers, while also enabling potential risks to be identified and addressed at an early stage. - Create an AI passport and traceability mechanisms for enhanced transparency and trust in
medical AI
New approaches and mechanisms are needed to enhance the transparency of AI algorithms
throughout their lifecycle. From this need can emerge the concept of an ‘AI passport’ for
standardised description and traceability of medical AI tools. Such a passport should describe and