Color Block


Uncovering hidden patterns in dementia that might save lives

November 3, 2017


By Darshak Sanghavi, MD and Samantha Noderer, MA

Decades ago, doctors had to rely on personal experience and luck to make discoveries.

For example, around World War II, a Dutch pediatrician noticed that some of his patients, who were malnourished despite being well-fed, suddenly improved when a grain shortage hit their community.1 The doctor unknowingly stumbled on celiac disease. Once he realized that gluten in wheat had caused the illnesses, he was able to effectively treat his patients.

Today, instead of relying on serendipity to understand mysterious conditions, we have the potential to uncover hidden trends in the massive amounts of data created by electronic medical notes and records.  

One way OptumLabs® is trying to do this is through a collaborative program with the Global CEO Initiative on Alzheimer’s Disease.2 OptumLabs and several research and expert partners are exploring the roots of Alzheimer’s disease and other dementias.

But instead of relying on chance, OptumLabs and our partners are using advanced data science techniques to organize vast information for research and visualize new, meaningful patterns in dementia.

Paper trails to computer processing

Over the past two decades, electronic health records (EHRs) have transformed health care by recording information digitally. Gone are the often illegible handwritten notes that made it impossible to measure or monitor trends across hundreds or thousands of patients.  Digital records now make better presentation and interpretation of large amounts of data possible.

While we’re now able to understand what the notes say, we’re still challenged in seeing trends across large populations. That’s why we need a way to take these unstructured, narrative texts in electronic notes, and create an easier-to-analyze spreadsheet. That's where “natural language processing” comes in.

Natural language processing (NLP) uses a combination of linguistic, statistical and exploratory methods to analyze text via computer programs and organize it for research.3 Here’s an example of how thousands of charts turn into spreadsheets via NLP.

In the OptumLabs clinical data, NLP-derived phrases are separated into tables based on their content. The Signs, Diseases and Symptoms (SDS) table, for example, filters to a medical concept that relates to the patient and provides details such as the location on the patient’s body, features such as severity, whether it’s confirmed or denied, and other notes.

Here we can see the NLP table filtered to fall risks. Location does not apply here.

fall risk(null)Altered(null)psychosocial
fall risk(null)(null)negativepsychosocial
fall risk(null)(null)negative(null)
fall risk(null)Moderate(null)(null)
fall risk(null)(null)negativesocial history


In short, NLP allows researchers to go from illegible handwritten notes to a structured spreadsheet of consistent data that we can review in a systematic way.

It’s still difficult to make discoveries across a massive table with thousands or millions of rows of data. However, there are helpful techniques that build upon NLP, allowing us to visualize big data and help find patterns.

Mapping patient journeys through clinical note visualizations

The visual communication of data is an invaluable learning tool. It can illustrate facts and relationships in context, revealing a higher level of understanding than text alone. OptumLabs is incorporating this approach in a variety of projects underway. 

We are using a combination of NLP and data visualization to find hidden clues in medical notes that people were developing signs of dementia before they were diagnosed. This is important because there may be measures that doctors can take to prevent or delay this progressive disease if they had warning.

To start, we used NLP to “read” clinical notes in our de-identified EHR database from more than 40 U.S. provider practice groups or independent delivery networks that serve more than 50 million people.

We looked back at the medical history of patients with dementia and the related terms that showed up one to four years earlier in their medical records.

To visualize the changes over time as patients move closer to their date of diagnosis, we organized the data in an “alluvial flow” diagram. This diagram style gets its name from nature’s alluvial fans that form as sediments carried from a point of flowing water build up over time.4

Alluvial flow diagram helps us understand the early signals that would be useful in future projects focused on prediction prevention and treatment of Alzheimers disease and related dementias

Figure. Alluvial flow diagram of dementia-related signs and symptoms mentioned in clinical notes three years prior to an Alzheimer’s disease or dementia diagnosis. Source: OptumLabs EHR Clinical Notes Data via NLP.

Arranged from largest to smallest, the size of the black bars at each time stamp represents the number of patients with each of these issues mentioned in their chart. The colored flows represent how many patients transition from one state to the next.

This visualization helps us understand the early signals that would be useful in future projects focused on prediction, prevention, and treatment of Alzheimer’s disease and related dementias.

We can use this same technique to explore patient journeys in other disease areas and use many types of data such as administrative claims data, longitudinal survey data, or disease registry information.

As we move from big data to bigger data in health care, OptumLabs continues to explore data visualization opportunities that uncover important patterns that would otherwise go unnoticed. In turn, these patterns just might lead to discoveries that save lives.

About the Authors

  • Darshak Sanghavi, MD, is chief medical officer at OptumLabs
  • Samantha Noderer, MA, is communications & translation manager at OptumLabs


  1. SlateWhy Do So Many People Think They Need Gluten-Free Foods? Published Feb. 26, 2013. Accessed Oct. 16, 2017.
  2. Global CEO Initiative on Alzheimer’s Disease
  3. Nature Reviews Genetics. Mining electronic health records: towards better research applications and clinical care.Published May 2, 2012. Accessed Oct. 16, 2017.
  4. National Geographic Society. Alluvial fans. Accessed Oct. 16, 2017.