We are developing a novel pipeline, algorithm and computational tool, to assist clinicians in assigning the best therapy to individual patients by integrating in a novel and transparent way multiple type of data from patients (including high dimensional genomics, EHR, treatment, imaging etc). The underlying method builds similarity clusters of patients: patient similarity networks are an emerging paradigm for precision medicine, in which patients are clustered or classified based on their similarities in various features, including genomic profiles.

The endpoint of this project is a BigMed dashboard tool that will support prediction of clinical outcomes for colorectal cancer based on patient clinical and laboratory data. In clinical practice, decisions about alternative therapies for colorectal cancer patients is taken be the clinician on the basis of a quite broad and complex frame of data, which are weighted against each other in part with the support of guidelines and in part by experience. This complex integration of data and knowledge is can benefit from a more structured algorithmic guidance.

We build on the netDX approach (1), very recently developed by the Computational Network Biology group at the University of Toronto (http://www.baderlab.org). The model hypothesis is that strong similarity between two patients' clinical data implies similar clinical outcomes. Clinical data stands here for a collection of all measured clinical data from patients in each of the possible categories (here being for example “disease free for a period of 12 months” after a certain treatment, or not). A strength of the netDX approach is that no preselection of the data is necessary, because the algorithm itself identifies the component of the data which are useful for the similarity-based classification. netDX requires to be adapted to every disease/data/purpose, where domain experts contribute fundamentally. Our novel implementation of netDX will be part of the BigMed dashboard.

Transparency and interpretability of decisions suggested by algorithms is needed for trust in the clinics. netDX allows clinicians to view records associated with matching patients, and see which parameters are causing the decision, by quantifying the role clinical variables play in supporting the similarity decision.

netDx presents several novel and important advantages:

  • netDx naturally handles heterogeneous data; any data type can be converted into a similarity network by defining a similarity measure, and integration of diverse data types is possible.
  • Missing data are naturally handled, since similarity networks using the available part of the data can still be used; uncertainty of the classification naturally tells to the clinician when the available patient data are insufficient for a reliable decision.
  • Similarity networks are conceptually intuitive, and can be visualized for inspection. This allows clinicians to “understand” the logic of the algorithm.
  • Proper feature selection enhances clinical interpretability; biological pathways, for instance, provides insight into disease mechanisms.
  • netDx is the first supervised patient classification system based on patient similarity networks. It has been compared with a diverse panel of machine-learning approaches in predicting survival across four different tumor types and does as well as or better.

These features are directly relevant to the goals and requirements of the BigMed community.

Having obtained positive results with netDx during initial testing, we wish to take the tool further into the clinics. The next steps are to automate it further and prepare a dashboard version, validate it on a few tests and present it to several clinical settings, to receive guidance in treatment decisions, or in prediction of diseases development, for a new individual patient. The netDX classification will be based on a database of patient data and outcomes, which we want to prepare for some clinical environments. The application will have functionality for variable interpretation, and will provide reliability and performance statistics. Additional functionality will allow similarity networks between patients to be queried and visualized in the clinical context.

Activities and deliverables are described below:

  1. An important consideration in a netDx analysis is the method used to map the input data to patient similarity networks. This can be done in alternative ways for a given data set. We will develop, evaluate and apply alternative methods for generating similarity networks. 
  2. Many functional elements of netDx can be tuned to better handle specific use cases. We want to understand this better, and will work with domain experts to tune netDX for the dashboard version. 
  3. In addition to classifying new patients, the production version of our similarity tool will be able to tell to the clinician the reasons for a specific prediction or classification, and which individual features have been crucial for the prediction. 
  4. We will characterise what types of data sets are difficult for netDx to implement. We will determine minimum number of patients and features needed, and learn how to identify and avoid overfitting. These will be control checks in the production version. 
  5. We will quantify the reliability of predictions. This is important, as for some patients, her/his existing data might be insufficient or too noisy. For such patients we want the tool to raise a cautionary flag. 
  6. We want to scale up the number of features, and expand the performance of the tool, possibly by using the Colossus HPC facility at TSD. 
  7. We will evaluate the robustness of netDx. We will study how the predictions are affected by features that are highly dependent or mistakenly duplicated. What is the consequence in terms of predictions of new patients?
  8. We will perform a blind validation test with a specialist, comparing the prediction of netDX with the ones made by an expert clinician. 


References: (1) Pai S, et al. netDx: interpretable patient classification using integrated patient similarity networks. Mol Syst Biol. 2019 Mar 14;15(3):e8497.