The Health Data Silo Challenge
July 25, 2023
Whether it's predictive analytics, artificial intelligence, or data science… advances on how health data can be used, is constantly progressing. With increasing digitization, mobile health applications, wearables and medical IoT, more and more data about our health is generated every day. This data can be used to identify trends and patterns that can improve our overall health and well-being, unlock cures for rare diseases and provide evidence for public policy decisions. Unfortunately, despite the powerful analytical tools that exist, the biggest obstacle leveraging and analyzing health data, is the access to data itself.
Accessibility to health data is a complex challenge due to the sensitive nature of the data itself, data privacy regulations across geographies, different IT systems, as well as analogue data to name a few.
To address this challenge, different players within the healthcare ecosystem will have to work together.
What are data silos?
Data silos, by definition, “are pockets of information stored in different information systems or subsystems that don’t connect with one another”. In other words, it's a lack of communication and collaboration that can lead to inefficiencies, errors, and duplicative work.
Healthcare is particularly susceptible to the silo problem because of the way data is often stored and accessed. Medical records are typically stored in electronic health records (EHRs), which are designed for individual provider use. Wearable or Medical IoT data is stored on different systems by different solution providers. Analogue data is not stored on any system given its non-digital nature. In addition, data may become siloed due to restrictions on use, government regulations and the lack of informed consent by a patient population. This makes it difficult to get a holistic view of a patient or patient population and effectively utilize all of the available data.
This creates a significant challenge to researchers who aim to make such data useful to the general population. Researchers are unable to access comprehensive, diverse and up-to-date datasets. This can slow down the development of new treatments, new algorithms and the like, thereby hindering advances in modern medicine.
Equally of importance is addressing the issues around the misuse of data where explicit patient consent has not been obtained, which has fueled a growing mistrust among the general population. A study conducted by the American Medical Association earlier this year, highlighted that “75% of patients wanted to opt-in before a company uses any of their health data”.
How do we solve this problem with technology?
To address these challenges, data needs to be better protected and decentralized. Approaches such as zero knowledge encryption (an encryption process in which your data is secured with a private key that only you have access to, to decrypt your data) and differential privacy (a system for sharing information about a group within the dataset without compromising individual-level information within the dataset) provide a way in which data can be protected while implementing controls to data use and maintaining patient data sovereignty.
The use of federated databases and data in situ computing provides a way to use data at rest between different institutions and across jurisdictions. This allows data sources to be connected while maintaining data privacy and data security.
Issues around data standardization also needs to be addressed given the many different data standards that are used. Moreover, data from different sources lack a shared identifier to combine such data into an aggregated coherent usable dataset for research.