Extraction of text from health records with NLP and ML

This project began with the ambitious aim of reducing the workload for clinicians treating patients with colorectal cancer.

The initial approach was to arrange Design Thinking sessions with different stakeholders. After these sessions, we quickly realized that it is particularly challenging for the clinicians to prepare for *MDT-meetings, as they have to go through a vast amount of data from medical records. We have therefore focused on the possibility of developing decision support by extracting relevant text and data from medical records, using a rule-based approach.

" We successfully developed algorithms to identify, classify and categorize the correct TNM-classification ... "

Our efforts were steered towards a number (n = 1000) of historical patient journals from 25 individual patients. We successfully developed algorithms for identifying, classifying and categorizing the correct **TNM-classification for any given patient out of the 25. Based on the limited number of patients, a rule-based approach was assumed superior to a data-driven approach.

The ability to extract information about the stage of a patient’s cancer from medical records has been made available for integration with other services using Watson Explorer’s built in ***APIs.


*MDT – Mulitdisiplinary Teams

**A system for classifying the cancer stage. TNM= Tumor, Node, Metastasis

***API - Application Programming Interface




IBM, DIPS, Oslo Universitetssykehus


Machine Learning/AI,

Clinical Decision Support

Tri Cao Vu

IT Consultant | Cognitive Insights


Send email

Daniel Haugstvedt

Data Scientist


Send email