Genomic prediction of adaptation: statistical developments and application to an invasive species of crop pest, Drosophila suzukii

Key words : Hierarchical Bayesian Models, Deep learning, Population genomics, Adaptation, Genomic Prediction, Drosophila suzukii

Summary: By combining genomic and environmental data obtained on a wide range of populations assumed to be locally adapted, one can learn which alleles influence the adaptation to a given environment and, thus, predict the adaptive potential of a genotyped target population to a given new environment. One important application of this approach is to predict the risk represented by a population of a (pest) species of interest in a given environment, especially the risk of establishment of an invasive population in a new geographical area according to its present (or future) climate, the potential damage caused on a given host plant, or the ability to resist control methods. Hierarchical Bayesian Model (HBM) and Random Forests are potential statistical approaches that have already been used in this context. During this PhD, particular attention will be paid to the Baypass method (Gautier 2015), currently used to detect loci that facilitate adaptation to an environmental variable, as it can be extended to predictive tasks. The Baypass modelling framework is based on Bayesian linear models. It has the important property of accounting for population spatial structure, which is an important confounder in this context. However, this approach assumes a linear relationship between allele frequencies and environmental variables and does not explicitly models interactions between loci. Therefore, alternative approaches based on neural networks will be considered, as they are able to capture more complex associations between the genome and the environment and are well suited for the analysis of high-dimensional heterogeneous data. To compensate for the small number of observations that generally characterize population genomic prediction studies (an observation here being a population), it would be worth to implement so-called hybrid-AI methodologies by integrating additional information, for example on the structure or annotation of the genome, which is well known for some pest species. Finally, it would be worthwhile to explore frugal AI approaches that would allow the training set to be expanded to include unlabelled samples (i.e. populations for which the phenotype or environment has not been measured).

Profile and skills : This interdisciplinary project will involve concepts from population genetics, statistics and machine learning. Strong background in at least one of the three first fields is expected. A large part of the work will be devoted to the analysis and simulation of high throughput genomic data, which requires strong interest in modelling, computational approaches and programming.

Funding: A 3 years PhD grant is available from September 2024 through the target project “Agrostat” of the PEPR “MathsVives”.

 Application and contact : To apply to this position, please send a CV and a motivation letter at the two contact persons below.