This page contains the abstracts contributed to the MLMG2022 workshop.


K01 Computational inference of microbial genotype-phenotype relationships
Alice McHardy
Affiliations: Computational Biology for Infection Research, Helmholtz Centre for Infection Research, Germany.

Abstract : TBA

K02 Machine learning for predicting phenotype from genotype: how well do algorithms capture causal mechanisms?
Nicole Wheeler
Affiliations: Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, UK.

The application of machine learning to predict phenotype from genotype has become very popular over recent years. Unlike older genome-wide association study methods, machine learning algorithms allow us to jointly consider all genetic variation in a population and derive the best solution for predicting a trait of interest from these patterns of variation. However, these algorithms can perform poorly on new populations, and accurate models can be trained on genomic data with all known causal mechanisms stripped out. This leads to the question: when all genetic information is available, how well do machine learning algorithms select causal mechanisms for making predictions, and does the accuracy of a model indicate how well causal variants have been captured?

Link to the presentation.

Contributed talks

Talks are listed in program order.

T01 Predicting antimicrobial resistance genes from phenotypic resistance profiles: a proof-of-concept study
Gabriel Carvalho1, Katy Jeannot2, Patrick Plésiat2, Richard Bonnet3, Laurent Dortet4, François Vandenesch5, Jean-Philippe Rasigade1,5
Affiliations: [1] PHE3ID, Centre International de Recherche en Infectiologie, Institut National de la Santé et de la Recherche Médicale U1111, CNRS Unité Mixte de Recherche 5308, École Nationale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France; [2] CNR Pseudomonas, CHU de Besançon, [3] CNR Entérobactéries, CHU de Clermont-Ferrand, [4] CNR Carbapénémases, Hôpital Bicêtre, Assistance Publique-Hôpitaux de Paris, Centre National de Référence de la Résistance aux Antibiotiques; [5] Institut des Agents Infectieux, Hospices Civils de Lyon, France

Epidemics of antimicrobial resistance are increasingly linked with orizontally-transferred antibiotic resistance genes (ARGs), prompting the need for monitoring ARGs rather than specific bacterial strains for epidemic surveillance. Whole genome sequencing (WGS) has become popular but its current cost prevents its systematic use for ARG surveillance. We explore the feasibility of predicting ARGs from readily-available antimicrobial susceptibility profiles of bacteria generated by diagnostic laboratories. ARG prediction models based on random forests, support vector machines and generalized linear models were trained on an extensive collection of clinically relevant bacteria with diverse antibiotic susceptibility profiles. Model performance evaluation using leave-one-out cross validation suggests that support vector machine outperforms other methods for this task. The best-performing prediction models were then applied to predict ARG presence in all bacteria diagnosed at a large hospital group over 5years. The potential benefits and limits of this novel approach for antimicrobial resistance monitoring are discussed.

Link to the paper and to the presentation.

T02 ARSENAL: Antimicrobial ReSistance prEdictioN by mAchine Learning approach
Ulysse Guyet1,2, Léa Bientz 3, Véronique Dubois3, Jie Feng4, Jacques Corbeil5,6, Alexis Groppi1,2, and Macha Nikolski1,2
Affiliations: [1] Univ. Bordeaux, CNRS, IBGC, UMR 5095, Bordeaux, 33077, France; [2] Univ. Bordeaux, entre de Bioinformatique de Bordeaux (CBiB), Bordeaux, 33076, France; [3] MFP, CNRS 5234, Université de Bordeaux, Bordeaux, F-33076, France; [4] State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China; [5] Research Center in Infectious Diseases, CHU de Québec-Laval University Research Center and Department of Molecular Medicine and Big Data Research Centre, Faculty of Medicine, Laval University, Quebec City, QC, Canada; [6] Department of Molecular Medicine, Laval University, Quebec City, QC, Canada

Antimicrobial resistance (AMR) has become a major public health concern due to the rapid emergence of multidrug-resistant bacteria, causing serious problems for the prevention and treatment of persistent infections. Development of algorithms for phenotypic variation prediction, such as AMR, could be of major clinical importance, more reliable and efficient compared to traditional phenotyping, and could contribute to the discovery of previously unknown AMR pathways. Significant increase of the available sequencing and associated phenotypic data in recent years creates the basis for the development of such methods. Here, we developed a machine learning method -ARSENAL- for predicting the minimum inhibitory concentration (MIC) of several antibiotics based on genomic data. ARSENAL relies on one hand on the sequence (k-mers), and on the other hand on the genome structure (gene composition) and the gene orthology links between the strains of the same species. Functional interpretation of the most predictive features confirmed the biological relevance of the ARSENAL model.

Link to the paper and to the presentation.

T03 BenchmarkDR: A modular and expandable benchmarking pipeline for machine learning based antimicrobial resistance prediction
Niklas Stotzem1, Fernando Guntoro1, and Leonid Chindelevitch1
Affiliations: [1] Imperial College London, United Kingdom

The access to Next Generation Sequencing data has raised interest in the application and development of machine learning methods for antimicrobial resistance (AMR) prediction. The diversity of algorithms as well as possible representations of the genome in terms of different features leaves researchers with the issue of comparing new methods to existing ones or choosing the appropriate method for their data. To give them a helpful tool, we have developed BenchmarkDR (https://github.com/WGS-TB/BenchmarkDR), a modular and easily extendable end-to-end pipeline to benchmark the prediction performance of the variety of available methods. Currently, BenchmarkDR supports the preprocessing of raw genomic sequencing input data into three different representations and the training and evaluation of 16 binary classification methods for categorical predictions and 8 regression methods for MIC predictions. Its modular design makes it easily extendable with other preprocessing approaches and prediction methods. We believe it represents a valuable addition to the AMR prediction toolkit and will provide valuable insights into the methods’ relative strengths and weaknesses on a variety of bacterial datasets.

Link to the paper and to the presentation.

T04 Inferring effective population sizes of bacterial populations while accounting for unknown recombination and selection: a deep learning approach
Jean Cury1,4, Théophile Sanchez1, Erik Bray1, Jazeps Medina-Tretmanis2, Maria Avila-Arcos3, Emilia Huerta-Sanchez2, Guillaume Charpiat1, and Flora Jay1
Affiliations: [1] Université Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400, Orsay, France; [2] Center for Computational Molecular Biology, Brown University, Providence, RI, USA; [3] International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México (UNAM), Querétaro, México; [4] SEED, U1284, INSERM, Université de Paris, Paris, France

Inferring population size through time is a long-standing problem in population genetics. It consists, essentially, in reconstructing the demography of a population in the past, based on a sample in the present of the population. Many types of methods have been developed for decades, but it is only recently that deep learning based methods started to emerge. It has been shown, however, that in the case of bacterial populations, classical methods do not work, because the underlying assumption of these methods were not satisfied. Here, we design and evaluate how an end-to-end deep learning approach that accounts for unknown recombination and selection events performs on bacterial populations. We also propose various improvements to this framework, such as implementing uncertainty estimation.

Link to the paper and to the presentation.

T05 Hierarchical machine learning predicts geographical origin of Salmonella within four minutes of sequencing
Sion Bayliss1, Rebecca K. Locke1,2, Claire Jenkins3, Marie Anne Chattaway3, Timothy Dallman4 and Lauren A. Cowley1
Affiliations: [1] Milner Centre for Evolution, Department of Biology & Biochemistry, University of Bath, UK; [2] Genomic Laboratory Hub (GLH), Addenbrooke’s Hospital, Cambridge University Hospitals NHS Foundation Trust, UK; [3] Gastrointestinal Reference Services, UK Health Security Agency, Colindale, UK; [4] Institute for Risk Assessment Sciences (IRAS), Utrecht University, 3508 TD Utrecht, Netherlands

Salmonella enterica serovar Enteritidis is one of the most frequent causes of Salmonellosis globally and is commonly transmitted from animals to humans by the consumption of contaminated foodstuffs. Rapid geographical source attribution of suspect food vehicles facilitates outbreak management. In this study, 2,313 S. Enteritidis genomes collected by the UKHSA between 2014-2019 were used to train a hierarchical machine learning classifier to predict geographical origin of isolates for 38 countries. Highest classification accuracy was achieved at the continental level followed by the sub-regional and country levels (macro F1: 0.954, 0.718, 0.661 respectively). Longitudinal analysis and validation with publicly accessible international samples indicated that predictions were robust to prospective external datasets. This hierarchical machine learning framework provides granular geographical source prediction directly from sequencing reads in <4 minutes per sample, facilitating rapid outbreak resolution and real-time genomic epidemiology.

Link to the paper and to the presentation.