Stratégies pour l'analyse statistique de données transcriptomiques


In order to illustrate the variety of strategies applicable to transcriptomic data analysis, we first implement methods of exploratory statistics (PCA, multidimensional scaling, clustering), modelling (ANOVA, mixed models, tests) or learning (random forests), on a dataset coming from a nutrition study for mice. In a second stage, relationships between the previous results and clinical measures are studied through canonical correlation analysis. Most of the methods provide biological relevant results on these data. From this experience we conclude that there is not one best approach; we have to find the ‘good’ strategy combining exploration and modelling to fit the data as well as to achieve the biological purpose. From this point of view, a strong collaboration between statistician and biologist is essential.

Journal de la Société Française de Statistique