Bioinfo & Biostats

Bioinformatics and Biostatistics

Last updated on Mon, 31 Aug 2020 bioinfo

While developing transcriptomic approaches during my PhD, I quickly felt the need to learn more tools from biostatistics in order to perform thorough analyses of my data.

I quickly initiated a sustained collaboration with mathematicians from Toulouse Math Institute, in particular:

We first made an inventory of methods relevant to analyze transcriptomic data ( Baccini et al., 2005).
In order to integrate transcriptomic and lipidomic data acquired during my PhD, Igniacio Gonzalez developed the regularized version of Canonical Correlation Analysis ( Gonzalez et al., 2008).
This initial work was then taken in multiple directions by Kim-Anh Lê Cao who developed the first version of the mixomics package during her PhD (see for example Lê Cao et al., 2009) and still actively leads the development of this package and of data integration methods.

With Sébastien Déjean, we also developed methods to analyze transcriptomic data acquired during time-series experiments ( Déjean et al., 2010).

This experience in biostatistics was extensively used in my research projects. Notably I developed a pipeline for the analysis of microarray data that is still the foundation of the statistical analyses performed by the GeT-TRiX transcriptomic facility.

Since 2015, I have gained more experience in bioinformatics for the analysis of next-generation sequencing (NGS) data, in particular using R and Bioconductor.
I have developed my first R package named GeneNeighborhood. It allows to explore the orientation and proximity of the direct upstream and downstream neighbors of a predefined set of genes.
I have also developed the NanoBAC R package and a snakemake pipeline to assemble the reads obtained from BAC sequencing on the Oxford Nanopore long read sequencers.

genomics bioinformatics biostatistics data integration NGS

Bioinformatics and Biostatistics

Pascal GP Martin

Senior Research Specialist (IR1) at INRAE

Related

Bioinformatics and Biostatistics

Pascal GP Martin

Senior Research Specialist (IR1) at INRAE

Related

Publications

Clustering time-series gene expression data using smoothing spline derivatives.

Sparse canonical methods for biological data integration: application to a cross-platform study.

CCA: An R Package to Extend Canonical Correlation Analysis

Stratégies pour l'analyse statistique de données transcriptomiques

Talks

GeneNeighborhood: an R package to explore the direct neighbors of your favorite gene set

Core bioconductor packages for NGS data analysis