Florian Privé PhD Defense on 09/05/19

PhD Defense of Florian Privé from BCM team on September the fifth at 2pm. The title is :

« Genetic risk score based on statistical learning. »


Place : Amphithéâtre de l’IAB, Site Santé, La Tronche


Jury & Supervision :

    • Julien Chiquet, CR INRA Paris (Reporter)
    • Florence Demenais, DR INSERM Paris (Reporter)
    • Laurent Jacob, CR CNRS Grenoble (Examiner)
    • Benoit Liquet, Prof Univ Pau (Examiner)

    • Hugues Aschard, CR Institut Pasteur (co-Supervisor)
    • Michael Blum, DR CNRS Grenoble (Supervisor)
 

Abstract:
 
Genotyping is becoming cheaper, making genotype data available for millions of indi- viduals. Moreover, imputation enables to get genotype information at millions of loci capturing most of the genetic variation in the human genome. Given such large data and the fact that many traits and diseases are heritable (e.g. 80% of the variation of height in the population can be explained by genetics), it is envisioned that predictive models based on genetic information will be part of a personalized medicine.

In my thesis work, I focused on improving predictive ability of polygenic models. Because prediction modeling is part of a larger statistical analysis of datasets, I developed tools to allow flexible exploratory analyses of large datasets, which consist in two R/C++ packages described in the first part of my thesis. Then, I developed some efficient implementation of penalized regression to build polygenic models based on hundreds of thousands of genotyped individuals. Finally, I improved the “clumping and thresholding” method, which is the most widely used polygenic method and is based on summary statistics that are widely available as compared to individual-level data.

Overall, I applied many concepts of statistical learning to genetic data. I used extreme gradient boosting for imputing genotyped variants, feature engineering to capture recessive and dominant effects in penalized regression, and parameter tuning and stacked regressions to improve polygenic prediction. Statistical learning is not widely used in human genetics and my thesis is an attempt to change that.
 

Keywords:  Statistics, Genomics, Algorithms