Text Mining of Electronic Health Records Can Accurately Identify Systemic Lupus Erythematosus

electronic health record
Researchers presented a text mining algorithm to assess the diagnosis and characteristics of patients with systemic lupus erythematosus.

Text mining of electronic health records (EHRs) can be used to accurately identify and characterize patients with systemic lupus erythematosus (SLE), according to study data published on ACR Open Rheumatology. A text mining algorithm, designed to extract keywords from EHRs, was able to assign a diagnosis of SLE with high sensitivity and specificity. In addition, the algorithm was able to detect potentially life-threatening complications of SLE.

Researchers designed a text mining algorithm to assess EHRs for the diagnosis of 14 different immune-mediated inflammatory diseases (IMIDs) and the presence of 18 relevant symptoms. The algorithm was tested using EHR data from patients who underwent laboratory testing for SLE at a tertiary care center in the Netherlands between 2014 and 2017. To identify SLE and other IMIDs, the algorithm was designed to search the EHRs for keywords and calculate the likelihood of the diagnosis based on the frequency with which each keyword appeared. Keywords were selected based on the clinical components of SLE Disease Activity Index. The algorithm was also designed to perform context mining for each instance of keywords: keywords presented in a negative context were considered evidence against a diagnosis. Algorithm-generated diagnoses were compared to official diagnoses made from laboratory tests. To test the accuracy of the algorithm, a clinical immunologist manually checked the EHRs of 100 randomly selected patients. The immunologist’s diagnoses were compared to those generated by the algorithm. 

Between 2014 and 2017, 2038 patients underwent testing for the presence of anti-double stranded DNA (anti-dsDNA) for a total of 4607 times. Mean age at first test was 44.9 years, and 63.4% of patients were women. The algorithm assigned a diagnosis of SLE to a total of 510 patients (25%) across 2726 testing records (59%). Following the evaluation of 100 patient records, the sensitivity and specificity of the algorithm for SLE were estimated at 96.4% and 93.3%, respectively. The clinical immunologist assigned SLE as a diagnosis to 55 of the 100 patients; 53 of these patients were correctly diagnosed by the algorithm. The algorithm was also able to identify nephritis and pleuritis with good sensitivity (both ≥80%) and high specificity (both 97%). However, the algorithm did not perform as well in the identification of less serious symptoms, such as alopecia, mucosal ulcers, and vasculitis.

Results from this study support the use of a text mining algorithm for the identification of SLE and other inflammatory diseases. Text mining algorithms may be useful for the assessment of large volumes of medical records data, such as clinical trials. Further research is necessary to evaluate the applicability of this algorithm to other hospitals, particularly to those with a different EHR structure.

Study limitations included the limited portability of the machine learning model and the lack of inclusion of the primary criterion, anti-dsDNA.

“Our study demonstrates that text mining can be used for performing large-scale research and quality control with clinical data from the EHR in heterogeneous diseases such as SLE,” the researchers concluded.  


Brunekreef TE, Otten HG, van den Bosch SC, Hoefer IE, van Laar JM, Limper M, Haitjema S. Text mining of electronic health records can accurately identify and characterize patients with systemic lupus erythematosus. ACR Open Rheumatol. Published online January 12, 2021. doi:10.1002/acr2.11211