AI systems are not as rigorously tested as other medical devices, and have already made serious mistakes
Few tech startups publish their research in peer-reviewed journals, which allow other scientists to scrutinize their work, according to a January article in the European Journal of Clinical Investigation. Such “stealth research”—described only in press releases or promotional events—often overstates a company’s accomplishments.
And although software developers may boast about the accuracy of their AI devices, experts note that AI models are mostly tested on computers, not in hospitals or other medical facilities. Using unproven software “may make patients into unwitting guinea pigs,” said Ron Li, medical informatics director for AI clinical integration at Stanford Health Care.
Health products powered by artificial intelligence, or AI, are streaming into our lives, from virtual doctor apps to wearable sensors and drugstore chatbots. IBM boasted that its AI could “out think cancer.” Others say computer systems that read X-rays will make radiologists obsolete.
Even the U.S. Food and Drug Administration—which has approved more than 40 AI products in the past five years—says “the potential of digital health is nothing short of revolutionary.”
Yet many health industry experts fear AI-based products won’t be able to match the hype. Many doctors and consumer advocates fear that the tech industry, which lives by the mantra “fail fast and fix it later,” is putting patients at risk—and that regulators aren’t doing enough to keep consumers safe.
Early experiments in AI provide reason for caution, said Mildred Cho, a professor of pediatrics at Stanford’s Center for Biomedical Ethics.
Systems developed in one hospital often flop when deployed in a different facility, Cho said. Software used in the care of millions of Americans has been shown to discriminate against minorities. And AI systems sometimes learn to make predictions based on factors that have less to do with disease than the brand of MRI machine used, the time a blood test is taken or whether a patient was visited by a chaplain. In one case, AI software incorrectly concluded that people with pneumonia were less likely to die if they had asthma, an error that could have led doctors to deprive asthma patients of the extra care they need.
“It’s only a matter of time before something like this leads to a serious health problem,” said Steven Nissen, chairman of cardiology at the Cleveland Clinic.
Medical AI, which pulled in $1.6 billion in venture capital funding in the third quarter alone, is “nearly at the peak of inflated expectations,” concluded a July report from the research company Gartner. “As the reality gets tested, there will likely be a rough slide into the trough of disillusionment.”
Experts such as Bob Kocher, a partner at the venture capital firm Venrock, are more blunt. “Most AI products have little evidence to support them,” Kocher said. Some risks won’t become apparent until an AI system has been used by large numbers of patients. “We’re going to keep discovering a whole bunch of risks and unintended consequences of using AI on medical data,” Kocher said.
None of the AI products sold in the U.S. have been tested in randomized clinical trials, the strongest source of medical evidence, Topol said. The first and only randomized trial of an AI system—which found that colonoscopy with computer-aided diagnosis found more small polyps than standard colonoscopy—was published online in October.
Yet the majority of AI devices don’t require FDA approval.
“None of the companies that I have invested in are covered by the FDA regulations,” Kocher said.
Legislation passed by Congress in 2016—and championed by the tech industry—exempts many types of medical software from federal review, including certain fitness apps, electronic health records and tools that help doctors make medical decisions.
There’s been little research on whether the 320,000 medical apps now in use actually improve health, according to a report on AI published Dec. 17 by the National Academy of Medicine.
“Almost none of the [AI] stuff marketed to patients really works,” said Ezekiel Emanuel, professor of medical ethics and health policy in the Perelman School of Medicine at the University of Pennsylvania.
Some software developers don’t bother to apply for FDA clearance or authorization, even when legally required, according to a 2018 study in Annals of Internal Medicine.
Industry analysts say that AI developers have little interest in conducting expensive and time-consuming trials. “It’s not the main concern of these firms to submit themselves to rigorous evaluation that would be published in a peer-reviewed journal,” said Joachim Roski, a principal at Booz Allen Hamilton, a technology consulting firm, and co-author of the National Academy’s report. “That’s not how the U.S. economy works.”