Bioinformatics and Biostatistics
While developing transcriptomic approaches during my PhD, I quickly felt the need to learn more tools from biostatistics in order to perform thorough analyses of my data.
I quickly initiated a sustained collaboration with mathematicians from Toulouse Math Institute, in particular:
We first made an inventory of methods relevant to analyze transcriptomic data (
Baccini et al., 2005).
In order to integrate transcriptomic and lipidomic data acquired during my PhD, Igniacio Gonzalez developed the regularized version of Canonical Correlation Analysis (
Gonzalez et al., 2008).
This initial work was then taken in multiple directions by
Kim-Anh Lê Cao who developed the first version of the
mixomics package during her PhD (see for example
Lê Cao et al., 2009) and still actively leads the development of this package and of data integration methods.
With Sébastien Déjean, we also developed methods to analyze transcriptomic data acquired during time-series experiments ( Déjean et al., 2010).
This experience in biostatistics was extensively used in my research projects. Notably I developed a pipeline for the analysis of microarray data that is still the foundation of the statistical analyses performed by the GeT-TRiX transcriptomic facility.
Since 2015, I have gained more experience in bioinformatics for the analysis of next-generation sequencing (NGS) data, in particular using
R and
Bioconductor.
I have developed my first R package named
GeneNeighborhood. It allows to explore the orientation and proximity of the direct upstream and downstream neighbors of a predefined set of genes.
I have also developed the
NanoBAC R package and a
snakemake pipeline to assemble the reads obtained from BAC sequencing on the Oxford Nanopore long read sequencers.