There, it’s said. A recent investigative article by a Forbes staff writer, European-based Parmy Olson (as opposed to their innumerable guest writers), that dropped a week before Christmas Eve raised some uncomfortable questions about Babylon Health, certainly the star health tech company on the UK scene. These uncomfortable bits are not unknown to our Readers from these pages and for those in the UK independently following the company in their engagement with the NHS.
Most of the skepticism is around their chatbot symptom checker, which has been improved over time and tested, but even the testing has been doubted. The Royal College of Physicians, Stanford University and Yale New Haven Health subjected Babylon and seven primary care physicians to 100 independently-devised symptom sets in the MRCGP, with Babylon achieving a much-publicized 80 test score. A letter published in the Lancet (correspondence) questioned the study’s methodology and the results: the data was entered by doctors, not by the typical user of Babylon Health; there was no statistical significance testing and the letter claims that the poor performance of one doctor in the sample skewed results in Babylon’s favor. [TTA 8 Nov].
The real questions raised by the Lancet correspondence and the article are around establishing standards, testing the app around existing standards, and accurate follow up–in other words, if Babylon were a drug or a medical device, close to a clinical trial:
- Real-world evaluation is not being done, following a gradual escalation of steps testing usability, effectiveness, and safety.
- How does the checker balance the probability of a disease with the risk of missing a critical diagnosis?
- How do users interact with these symptom checkers? What do they do afterwards? What are the outcomes?
Former Babylon staffers, according to the Forbes article, claim there is no follow up. The article also states that “Babylon says its GP at Hand app sends a message to its users 24 hours after they engage with its chatbot. The notification asks about further symptoms, according to one user.” Where is the research on that followup?
Rectifying this is where Babylon gets sketchy and less than transparent. None of their testing or results have been published in peer-reviewed journals. Moreover, they are not helped by, in this Editor’s view, their chief medical officer stating that they will publish in journals when “when Babylon produces medical research.” This is a sad statement, given the crying need for triaging symptoms within the UK medical system to lessen wait times at GPs and hospitals. But even then, Babylon is referring patients to the ED 30 percent of the time, compared to NHS’ 111 line at 20 percent. Is no one there or at the NHS curious about the difference?
And the chatbot is evidently still missing things. In June, a doctor testing it found that it missed clear symptoms of a hypothetical pulmonary embolism. The doctor posted a video of the error on Twitter and complained to the UK Medicines and Healthcare products Regulatory Agency (MHRA)–and it was the third incident of its type. The MHRA has not taken action, however. The article also states that their in-house doctors have trouble being heard and that the company overall acts like a Silicon Valley startup–build it fast and get it out the door. Except that doesn’t operate well in the medical field where you are dealing with human lives and health.
Babylon is not a startup. Their funding has topped £57 million (US$72 million) at Series B (Crunchbase). The major fact is that both GP At Hand and the symptom checker have medical impact. They have contracts and licensing agreements with the NHS, Bupa, Samsung Health, Prudential Asia, and advertise widely. Certainly with their funding at hand, now is the time for testing efficacy and long term diagnostic outcomes.
Let’s compare Babylon Health to an established US company, Zipnosis, which is also a symptom checker, but their context is ‘white labeled’ within a medical system. Their online adaptive interview replicates the first stage of an in-person office visit and adds tracking analytics for the health system channeling to an online referral within the system. Certainly the same questions should be asked of Zipnosis, but why for Babylon no peer-reviewed studies or even plentiful, partner-based case studies, which Zipnosis evidently has?
Perhaps these two companies should put their corporate heads together.
Babylon is also not Theranos. Here is where this Editor differs with Forbes. The article, when read by itself, raises legitimate questions on how Babylon is operating. Ms. Olson is a real reporter doing a real investigation. It is sourced, she dug to find real questions, she found real evidence of lack of rigor, dare we say cut corners, and she wrote about these issues soberly. But the two pulp fiction-quality cartoon illustrations with a smiley face on a smartphone with distressed users create a sensational cast and skew the finished result.
We hope Ms. Olson continues to investigate and publish like the estimable John Carreyrou with Theranos–unlike The Fraud That Was Theranos, she has an opportunity to be part of making Babylon a better company. Babylon also owes her some answers and transparency. we hope she continues to pursue the story.
At this point, there’s no evidence that Babylon is vaporware. There are no non-working Edison machines and miniLabs which Theranos hid behind a veil of extreme secrecy, Silicon Valley hype, and ‘I Want To Believe’. Babylon is working with the NHS with GP At Hand and has licensed their technology to savvy companies. CEO/Founder Ali Parsa isn’t Stanford dropout Elizabeth Holmes channeling Steve Jobs and charming old men–he has a PhD in engineering physics, has made himself available to the UK medical community (Roy Lilley’s September chat), and has not a blond hair in sight. Mr. Parsa has also declared a $100 million investment in AI to bolster Babylon’s tech, though other work in using symptom data is either not there or not known.
This Editor would suggest to Mr. Parsa that a few steps backward to resolve the questions raised above and in the article are critically needed before moving forward. Quality testing and peer-reviewed studies are needed, along with supportive internal quality processes. Action in these areas would bolster the company’s reputation, because these issues will be raised by European authorities and in the US, both of which represent the future for Babylon Health.