Functionality for clinical Natural Language Processing

Focus Area

Machine Learning/AI

Status: In progress

Partners: University of Oslo, Oslo Universitetssykehus, NTNU, Akershus University Hospital

Current state of the art solutions from the field of Language Technology employ machine learning algorithms and large volumes of data to solve tasks that enable understanding of textual data, such as the extraction of entities and relations between these. Even though much progress has been made in terms of providing tools for the processing of unstructured data, BigMed faces two main challenges. 

Firstly, data from the medical domain is notoriously difficult to process automatically due to large variation, sparse data and the large cost associated with the manual annotation of such data. Further, the processing of Norwegian texts require an effort to adapt and develop tools specifically tailored for Norwegian clinical text. 

BigMed's NLP-group aims to develop tools for processing of Norwegian medical text by 

(i) re-use and adaptation of existing tools for general-domain Norwegian text, and 

(ii) creating reusable medical language resources for Norwegian, 

(iii) applying state-of-the-art machine learning techniques that enable the combination of data sources and generalization to new data. 

The NLP effort will mostly be focused on the "Sudden Cardiac Death"-use case and collaborations with AHUS. General activities are described below: 

  •  Pre-processing pipeline:
    - sentence splitter
    - tokenizer
    - PoS-tagger
  •  Clinical input representations: training of vector space representations (embeddings) for clinical terms 
  •  Text corpus of clinical text 
  • Clinical entity detection using machine learning


Use case 1: Family history extraction from clinical text (w/OUS) - synthetic corpus of family history statements, manually annotated - SVM models for family history extraction
Resources: https://github.com/ltgoslo/NorSynthClinical
Publication: Taraka Rama, Pål Brekke, Øystein Nytrø and Lilja Øvrelid: " Iterative development of family history annotation guidelines using a synthetic corpus of clinical text" 

Use case 2: Text classification from radiology reports (w/AHUS) - neural models for classification of radiology reports - domain-specific word embeddings trained on clinical text  

Figure 1. Example from synthetic corpus of clinical family history statements.


Partners:  Language Technology Group (IFI, UiO), OUS, NTNU, AHUS  

Portrait of Lija

Lilja Øvrelid

UiO IFI

+47 467 82 458

Send email

Relevant Projects

Natural Language Processing (NLP) and Sudden Cardiac Death (SCD)

Tablet with medical data

Sudden Cardiac Death (SCD) accounts for approximately 5,000 deaths yearly in Norway. High-risk individuals when identified can be offered life-saving therapy. Selection of patients for this therapy and prediction of SCD is one of the