Synthetic data’s growing role in healthcare AI, machine learning and robotics

Today there is a bottleneck in the development of artificial intelligence and machine learning – real-world data collection. AI and machine learning models require large datasets to become proficient at a task.

But preparing these datasets for model training is both costly and labor intensive. It is a conundrum, and the lack of large, accurately labeled datasets for specific applications is holding back the development of artificial intelligence and machine learning.

Some say synthetic data offers a solution – data that imitates real-world data. Instead of manually collecting and labeling datasets from the real-world, synthetic data is instead computer-generated.

“The ultimate result is that artificial intelligence and machine learning models can be trained faster, more cost-effectively and without the constraints of real-world data collection,” said Michael Naber, CEO and cofounder of Simerse, which creates synthetic data to train AI and machine learning models. The company offers a free guide to synthetic dataHealthcare IT News interviewed Naber for a deep dive into synthetic data and its role in healthcare.

Q: Why should healthcare CIOs, CMIOs and other health IT executives be aware of synthetic data? Why is it important to them?

A: Health IT executives should take note of synthetic data because it represents the future of robotic surgery and data-powered medicine. I estimate that robotic surgery will follow the development path of autonomous vehicles. Think about it: You can’t trust an AI robot to perform surgery on real people right away. You will have to do it millions of times in simulation first to prove the technology’s safety.

Companies that want to be on the forefront of this revolution should be seriously investigating the concept of synthetic data, or at least partnering with a company focused on synthetic data. Healthcare CIOs may not want to build this capability in-house, but should prepare their businesses for lowered AI model-training costs as a result of this oncoming technological wave.

Beyond synthetic data, computer vision as a whole is a promising technology for the healthcare industry. From a health and safety perspective, computer vision can monitor how often and how long healthcare providers wash their hands, evaluate in real time whether hospital-bed patients are motioning for help, and keep track of medical tools within a room.

Q: What is synthetic data for medical imaging?

A: Computer vision applied to medical imaging is certainly a promising area of medical research. Numerous companies and universities are working to apply machine learning to detect diseases, bone fractures and a whole number of other ailments from medical images.

Synthetic data will only accelerate this research. By leveraging synthetic data, researchers will be able to artificially create synthetic injuries in medical images, and then teach computers to detect and analyze those medical images.

In the future, AI likely will be assisting doctors not only in diagnosing injuries, but in giving medical recommendations as well. Synthetic datasets can help evaluate the impact of AI-powered diagnosis by creating a circular feedback loop, and may even be able to act as a testbed for AI-derived recommendations. 

Artificial Intelligence in healthcare certainly has a long way to go from its current state of research and development, but the implications of accurate diagnosis and recommendations powered by AI in healthcare are profound.

Synthetic data also addresses some personally identifiable information and HIPAA concerns. Synthetic data is generated by a computer, and therefore is not based on real people’s data or health records. For a healthcare CIO, this is a blessing. Synthetic data devoid of people’s private data can be stored on a plethora of data centers under less stringent regulations.

Q: Where does robotics come in with synthetic data? How can synthetic data be used to teach robots how to operate?

A: As I mentioned, robots generally require extensive simulation before they can be adapted to real-world use. Creating the simulations necessary to teach robots how to operate is a challenge of both complexity and realism. These simulations will need to be highly realistic, which implies an incredibly high threshold of complexity necessary to properly teach robots how to perform medical tasks.

To look to the future of robotic simulation, we must first look at the past of autonomous vehicle simulation. Companies like Waymo and GM Cruise have really taken the lead by creating highly realistic driving simulations. Virtual vehicles are able to drive around virtual environments and ultimately teach themselves the proper motions, considerations and rules to follow when it comes to self-driving.

Similarly, robotic surgeons will need to learn techniques – perhaps through either supervised or unsupervised deep learning – to complete successful medical procedures. Ultimately this will involve some trial and error, but the point of doing that in a simulated environment is to work out the kinks where the consequences are entirely virtual.

Q: How far away is synthetic data from mainstream use in healthcare?

A: Within three to five years I expect to see research and development for robotic surgery to happen in earnest. It may be five to ten years before AI-powered robots are deployed in hospital rooms performing autonomous surgery, but certainly the R&D necessary to do that will begin soon. 

For leading healthcare and life science companies, I think it will be important to budget in research and development expenditure to ensure that companies are prepared for the coming advancements in life science and healthcare robotics.

Healthcare is a highly interesting field, and both startups and established companies alike are going to have an interest in applied robotics when it comes to taking care of people. There are so many applications for AI, machine learning and computer vision within the field of healthcare, that I think it’s important for companies to not bite off more than they can chew.

Synthetic data and simulations as a whole will be challenging to create and will have to be tailored to a particular task, so companies should go after one robotic action at a time rather than trying to solve everything under the sun with AI. A deliberate approach to robotics R&D is a smart thing for healthcare CIOs to consider.

Overall, I am highly optimistic about the future of healthcare and the opportunities for robotics to play a positive role in life sciences. As companies and research universities continue to push the envelope in healthcare, I think we can all appreciate and be inspired by advancements in medical technology. I look forward to seeing how simulation and synthetic data play a positive role in healthcare, and what the future holds for life sciences.

Twitter: @SiwickiHealthIT
Email the writer:
Healthcare IT News is a HIMSS Media publication.