Identifying SLE-Related Health Demands Using Data Mining, Big Data
The age of electronic communication has enabled unparalleled access to information, irrespective of geographical location.
The age of electronic communication has enabled unparalleled access to information, irrespective of geographic location. With a few keystrokes, individuals with acute symptoms or chronic diseases can search for possible causes and treatment and share their findings on the internet. The value of this searching and sharing of health-related information is harnessed to provide insight into disease trends, including emerging epidemics. “Infodemiology” and “infoveillance” are terms used to describe an emerging approach to public health research, based on Google Trends big data monitoring and data mining. Infodemiology is the description and analysis of health information and communication patterns gathered from data generated from electronic medium.1 Infoveillance is the translation of infodemiology data for surveillance purposes. Examples of infodemiology applications include prediction of disease outbreaks (eg, influenza) through the analysis of internet search volumes, syndromic surveillance through monitoring of social media postings, analysis of disparities in availability of health information, and “seasonability of autoimmune diseases.” 1,2
Analyzing how people search and navigate the internet for health-related information, and how they communicate and share this information, can provide valuable insight into a population's health-related behaviors. The traditional approach to public health epidemiologic surveillance relies on the retrospective analysis of data from clinical, diagnostic, and mortality trends to estimate and plan for future events. Data mining and big data analysis of internet searches allows the identification of disease trends in near real time. Data from the Google Trends tool for flu has been leveraged to detect regional outbreaks of influenza up to 10 days sooner than possible with conventional surveillance, potentially offering a powerful tool for the tracking of infectious and noninfectious diseases.3 Such near real-time tracking of health information trends allows the identification of gaps between the supply of and the demand for information.
Rheumatology Advisor interviewed Savino Sciascia, MD, PhD, assistant professor at the Center of Research of Immunopathology and Rare Diseases, University of Turin, Italy, and the study's co-author. We asked Dr Sciascia: Whereas highly prevalent conditions such as seasonal influenza generate data that can be used for public health planning, can meaningful insight be drawn from big data analysis for rare diseases such as systemic lupus erythematosus (SLE)? According to Dr Sciascia, “Collecting homogenous data is challenging and, due to the low prevalence of rare diseases, funding might be arduous. For similar reasons, most information on prevalence are based on registry data.” He added, “By extrapolating relative search data, one can obtain relative information on how patients navigate the Web and how the spreading of information on rare conditions is affecting clinicians and patients in different areas of the globe.”
Individuals with SLE frequently rely on internet searches for information about the disease and its management. Data mining has been applied to rare diseases to explore the distribution of systemic autoimmune diseases in different populations. A recent analysis of Google Trends data generated over a 10-year period identified a correlation between the search terms “lupus” and “relapse,” and between “lupus” and “fatigue” in the Northern hemisphere (P =.019; P =.003, respectively), and a significant correlation between “relapse” and “fatigue” in both the Northern and Southern hemispheres (P <.001; P =.018, respectively).4
Another study exposed connections in systemic autoimmune diseases and found significant differences in the geographic, age, gender, and ethnic distribution of these diseases, with a higher frequency of SLE, inflammatory myopathies, and Kawasaki disease in African-American patients.5 This information can be leveraged to target disease-related information and treatment more adequately.
In a longitudinal analysis of data gathered from scientific databases (SCOPUS, Medline/PubMed/ClinicalTrails.gov) and Google Trends over a 5-year period, the search terms “SLE” and “lupus,” indicated a geographic heterogeneity in populations' SLE-related behaviors.2 This heterogeneity seemed to be influenced by the search engine, available publications, new treatment options, and celebrity culture. For instance, the peaks in Google Trend searches were closely linked with the celebrities affected by SLE and correlated with the approval of belimumab, the first drug approved for the disease in more than 50 years.2 Search volumes on Medline/PubMed, SCOPUS, and ClinicalTrials.gov were less represented in South America, Canada, Australia, and South Africa, perhaps reflecting lower research activity and publication volume, according to the study's authors. By contrast, higher search volumes from Medline/PubMed, SCOPUS, and ClinicalTrials.gov were observed in China, whereas search volumes from Google Trend were less represented in the region. This study supports the notion that data mining and big data monitoring may provide insight into the seeking behavior of patients with SLE for health-related information.
Infodemiology provides a novel approach for the investigation of disease trends in near real time. Questions and uncertainties remain about the consistency of research design, the gathering of data, and the analysis and interpretation of results.
“This methodology has some intrinsic limitations”, noted Dr Sciascia, adding, “available data are based on a sample of Web searches, with the potential for nonrepresentative sampling bias and differences in access to the Web. Consequently, the calculation of the search value index is dependent on several mathematic assumptions and approximations in search traffic.” In addition, the ethical framework for such approaches is yet to be defined clearly, particularly with respect to informed consent for the use of personal information posted online.6,7
- Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res. 2009;11(1):e11.
- Sciascia S, Radin M. What can Google and Wikipedia can tell us about a disease? Big Data trends analysis in systemic lupus erythematosus. Int J Med Inform. 2017;107:65-69.
- Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis. 2009;49(10):1557-1564.
- Radin M, Sciascia S. Infodemiology of systemic lupus erythematous using Google Trends. Lupus. 2017;26(8):886-889.
- Ramos-Casals M, Brito-Zerón P, Kostov B, et al. Google-driven search for big data in autoimmune geoepidemiology: analysis of 394,827 patients with systemic autoimmune diseases. Autoimmun Rev. 2015;14(8):670-679.
- Norval C. Henderson T. Contextual Consent: Ethical Mining of Social Media for Health Research. Published January 26, 2017. Accessed November 21, 2017.
- Linares-Orozco R. Understanding “Consent” in the Age of Big Data and Human Research. CORE. Published June 12, 2017. Accessed November 21, 2017.