AI Traceability and Explainability: A Healthcare Imperative

In medicine, a decision without traceability is a contestable decision. A diagnosis made without reasoning, a protocol followed without record-keeping — these are not merely procedural deviations, they are patient risks.

AI is making its way into medical devices. It detects, classifies, recommends, and even decides. And here again, a decision produced by an algorithm must be just as traceable and explainable as a human decision. This is the baseline requirement for a clinician to be able to take responsibility, for a patient to trust their treatment, and for a manufacturer to defend what it has placed on the market.

The good news is that the MD regulatory framework and the AI Act are, on this point, perfectly aligned with this logic.

The Black Box Paradox

As always, compliance is demonstrated, not asserted. Every design decision, every risk assessment, every verification must be traceable to its origin through logical reasoning.

Yet most mainstream AI tools operate as black boxes: you enter a question, you get an answer. This is inherent to the technology itself. The chain of reasoning that produced a response is by nature opaque, non-reproducible, and often unverifiable.

The question is therefore not only one of performance (there is no shortage of examples where AI outperforms human decisions), but how to rely on it without fearing for your liability.

What the Regulations Say

The MD regulatory framework was not designed for AI, but it applies to it with notable consistency. Let us review the key requirements and their translation for us mere mortals.

MDR — Articles 10 and Annex I The MDR requires full traceability of design decisions and demonstrable risk management documentation. Applied to AI: every output produced by an automated system must be traceable to documented reasoning.

IEC 62304 — Medical Device Software Lifecycle IEC 62304 governs the design, verification and validation of MD software, and embedded AI is MD software. This implies: software risk classification, traceability requirements between model outputs and functional specifications, and documentation sufficient to make V&V repeatable and defensible. A trained model whose decisions cannot be explained is de facto very difficult to validate under IEC 62304.

ISO 14971 — Risk Management ISO 14971 requires that every identified risk be assessed through documented reasoning, and that control measures be justified. For an AI involved in a clinical decision, this raises a direct question: how do you assess the risk of an erroneous decision without understanding the reasoning that produced it?

ISO 13485 — §8.2.1 and Post-Market Surveillance Beyond document control, ISO 13485 requires systematic feedback on device performance under real-world conditions. For embedded AI, this means being able to detect behavioural drift — which presupposes that expected behaviour has been documented precisely enough to identify a deviation.

AI Act — Articles 9, 12, 13, 50 and 86 The AI Act explicitly classifies medical devices as high-risk AI systems (Annex III), with obligations that overlay the MDR.

Article 9: The risk management system must specifically cover the AI system, documented and maintained throughout the lifecycle. Article 12: Usage logs must be automatically retained: dates, usage context, outputs produced. Article 13: The system's technical documentation must enable users to understand what the system does and the limits of its reliability. Article 50: Users must be informed that they are interacting with an AI system. A clinician consulting a recommendation generated by an algorithm must be explicitly notified. Article 86: Any user subject to a decision assisted by high-risk AI may demand explanations of the underlying reasoning. This right is enforceable — the manufacturer bears liability.

The texts are all consistent with one another. All require, from different angles, that algorithmic decisions be documented, explainable, and defensible.

Explainability Techniques: What Is Actually Usable

With the foundations laid, "how do we actually do this?" Here are the Explainable AI (XAI) levers I use, all of which are of course combinable and cumulative. And to back up my claims, I include examples of how we implemented them at Gordios.

Source-Based Justification This is the most obvious approach. You force the AI to cite the exact data it used to reach its conclusion. This can be data integrated upstream (a scientific corpus, HAS recommendations, etc.) and/or input data (a region on an X-ray, a passage from a conversation). At Gordios, every document review point is supported by the source (the standard requirement) AND the relevant passage from the document.

Confidence Scoring You can also establish thresholds for a confidence score (documented in the QMS 😉 ), and require the AI to indicate its degree of confidence in the result with every decision. It is even possible for another AI agent (a verifier) to assign this score independently. Below the threshold, an alert or manual review is triggered. Gordios integrates a confidence score determined by a different AI agent, and any result with a score below 90% triggers a manual review of the review point.

Decomposed Reasoning Chain Some systems allow the AI's reasoning to be structured into verifiable steps. Each step produces a result, but the overall result depends on a decision tree or a separate pass. Each step can therefore be independently verified. We used this approach at Gordios for complex verifications on multi-criteria standards.

Software Testing Synthetic data creation, test runs under real-world conditions, etc. Nothing new under the sun — software testing remains the best guarantee of a quality system.

The Disclosure Obligation: The Point Many Overlook

The AI Act introduces an obligation often underestimated in B2B contexts: the user must be explicitly informed that they are interacting with an AI. In practice, this means that every result must be labelled as AI-produced, in a visible and unambiguous manner. And if that result influences a clinical decision, Article 86 grants the user an enforceable right to an explanation of the underlying reasoning.

Traceability: Two Distinct Levels Not to Be Confused

Traceability in a regulatory AI context operates on two levels that are essential to distinguish.

Tool traceability: who launched which analysis, on which version, on what date, with what parameters. In short, an audit trail — "what was done and by whom?"
Decision traceability: why is this point deemed compliant or non-compliant — "on what basis was this conclusion produced?"

Both are necessary. Tool traceability alone does not allow you to defend the conclusion. Decision traceability alone, without usage logs, does not satisfy Article 12 of the AI Act, nor Article 22 of the MDR on post-market surveillance.

Conclusion

In Healthcare, traceability and explainability must be design characteristics. In practice, this means:

Pointing to the exact source passage for every result produced
Maintaining a timestamped and immutable log of every execution
Clearly indicating to the user that the result comes from an AI
Documenting confidence thresholds and the policy for handling edge cases
Producing reproducible results on the same document
Testing, testing, and testing again

Julien Beaudaux

AI Traceability and Explainability: A Healthcare Imperative ​

The Black Box Paradox ​

What the Regulations Say ​

Explainability Techniques: What Is Actually Usable ​

The Disclosure Obligation: The Point Many Overlook ​

Traceability: Two Distinct Levels Not to Be Confused ​

Conclusion ​