Methods in Medical Informatics

Medical Data Science

Recent advances in high-throughput technologies have led to an exponential increase in biological data (such as genomic, epigenomic and proteomic data). To find meaningful insights in such large data collections, efficient statistical learning methods are needed. We are interested in developing and applying new machine learning / statistical learning methods to solving biomedical problems and answering new biomedical questions. Previously, we focused on proteomic data, but now the focus is more on clinical, genomic and epigenomic data.

Application areas include the study of viruses like HIV, Hepatitis C or Influenza as well as the field of epigenetics. Method-wise we are interested in

  • integration of heterogeneous data sets
  • improving interpretability of non-linear estimators
  • efficient learning methods for large data sets

All those topics can be subsumed under the category “Machine learning for precision medicine”.


  • Improving HIV coreceptor usage prediction
  • Learning of properties that determine cell entry efficiency of HIV
  • Analysis of broadly neutralizing antibodies against HIV
  • HIV drug resistance prediction


  • Modeling the immune response of infected patients


  • Analysis of viral evolution


  • Open chromatin determination
  • Hi-C data analysis

Integrative data analysis

  • Combining data from different data types for clustering subpopulations in cancer
  • Robust supervised analysis of noisy cancer data