There, it’s said. A recent investigative article by a Forbes staff writer, European-based Parmy Olson (as opposed to their innumerable guest writers), that dropped a week before Christmas Eve raised some uncomfortable questions about Babylon Health, certainly the star health tech company on the UK scene. These uncomfortable bits are not unknown to our Readers from these pages and for those in the UK independently following the company in their engagement with the NHS.
Most of the skepticism is around their chatbot symptom checker, which has been improved over time and tested, but even the testing has been doubted. The Royal College of Physicians, Stanford University and Yale New Haven Health subjected Babylon and seven primary care physicians to 100 independently-devised symptom sets in the MRCGP, with Babylon achieving a much-publicized 80 test score. A letter published in the Lancet (correspondence) questioned the study’s methodology and the results: the data was entered by doctors, not by the typical user of Babylon Health; there was no statistical significance testing and the letter claims that the poor performance of one doctor in the sample skewed results in Babylon’s favor. [TTA 8 Nov].
The real questions raised by the Lancet correspondence and the article are around establishing standards, testing the app around existing standards, and accurate follow up–in other words, if Babylon were a drug or a medical device, close to a clinical trial:
- Real-world evaluation is not being done, following a gradual escalation of steps testing usability, effectiveness, and safety.
- How does the checker balance the probability of a disease with the risk of missing a critical diagnosis?
- How do users interact with these symptom checkers? What do they do afterwards? What are the outcomes?
Former Babylon staffers, according to the Forbes article, claim there is no follow up. The article also states that “Babylon says its GP at Hand app sends a message to its users 24 hours after they engage with its chatbot. The notification asks about further symptoms, according to one user.” Where is the research on that followup?
Rectifying this is where Babylon gets sketchy and less than transparent. None of their testing or results have been published in peer-reviewed journals. Moreover, they are not helped by, in this Editor’s view, their chief medical officer stating that they will publish in journals when “when Babylon produces medical research.” This is a sad statement, given the crying need for triaging symptoms within the UK medical system to lessen wait times at GPs and hospitals. But even then, Babylon is referring patients to the ED 30 percent of the time, compared to NHS’ 111 line at 20 percent. Is no one there or at the NHS curious about the difference?
And the chatbot is evidently still missing things. (more…)