An annotation tool based on natural language processing provided better agreement between raters and was faster than manual reviews for phenotyping cognitive status.
“Cognitive status phenotyping is a challenging task because dementia often goes undiagnosed, and identifying signs of cognitive decline in EHRs involves reading clinician notes and combining them with other information in the patient’s chart, such as their problem lists, medications, care coordination notes, and MRI orders,” explains Sudeshna Das, PhD.
Dr. Das adds that clinicians often use a wide range of terms and phrases that can easily be overlooked in manual exams. “Natural language processing (NLP) has the ability to automatically detect cognition-related patterns and sentences, reducing the chance that the annotator will miss information relevant to the decision-making task,” she says.
For a study published in the Journal of Internet Medical Research, Dr. Das and colleagues investigated whether NLP-powered semi-automatic annotation could improve the speed and inter-rater reliability of chart reviews for phenotyping cognitive status. Clinicians judged the cognitive status of patients using the semi-automated NLP-powered Annotation Tool (NAT) or traditional chart reviews. The patient records contained EHR data from two groups at Mass General Brigham: Medicare beneficiary records from the Mass General Brigham Accountable Care Organization (ACO dataset) and records from 2 years before a COVID-19 diagnosis to the date of COVID-19 diagnosis (COVID-19 dataset).
Researchers summarized diagnostic codes, medications, and lab test results, and clinical notes were managed through an NLP pipeline. Cognitive status was assessed as normal, impaired, or indeterminate, and assessment time and inter-rater agreement of NAT versus manual chart reviews for cognitive status phenotyping were assessed.
Faster NAT results, with better consensus
Dr. Das and colleagues included 627 patients in the study (ACO dataset, N=100; COVID-19 dataset, N=527). Patients in the COVID-19 dataset were less likely to have an ICD code for dementia.
“NAT arbitration resulted in greater agreement between raters (Cohen κ, 0.89 vs. 0.80) and was significantly faster (time difference: mean, 1.4 minutes; P<0.001) compared to manual chart reviews,” notes Dr. Das. “NAT adjudication provided assessments that had stronger clinical consensus due to its integrated understanding of highlighted and relevant information and semi-automated features of NLP.”
The COVID-19 dataset served as a case-control study that examined the association between pre-existing cognitive impairment and adverse events related to COVID-19, says Dr. Das. “It was used as an example of a research cohort that required cognitive status labeling.”
The cognitive status of 21.1% of patients in the COVID-19 dataset was indeterminate, indicating that there was little information available in the EHR to determine the cognitive status of this group.
Use of NAT in clinical settings
The tool used in the study “is primarily intended for annotating research cohorts, but it can be used to identify patients with cognitive issues who may not have a formal diagnosis in their records,” according to Dr. Das.
“Tools that filter the EHR for warning signs and present the condensed information to providers may prove to be an important step for early intervention,” she says. “In preoperative settings, baseline cognitive impairment is often missed and increases the risk of delirium in elderly patients by up to five times. Early recognition of risk using our tool can enable preventative measures that reduce the incidence, severity, and/or duration of delirium. The tool can also be used in hospital or emergency settings to reduce the costs of routine screening.
Future research is needed before NAT can be used in larger patient populations, she continues.
“Although NAT improves cognitive status assessment over manual chart reviews, it is not yet scalable in large datasets with thousands of patients,” Dr. Das notes. “To scale to this extent, fully automated machine learning algorithms that replicate the auction process are needed. In future work, we plan to use NAT to develop benchmark datasets for training and validating such machine learning algorithms for cognitive state phenotyping.