Transipedia : project for sequences large-scale exploration and its application in cancer

RNA-seq,

Transcriptomics,

Bioinformatics,

Cancer.

non-coding RNA.

Chloé Bessière Therese Commes, Daniel Gautheret – R’nBlood Team’s – RNA Biology in Hematologic Tumors

The main objective of Transipedia is to make thousands of RNA-seq datasets accessible and to quickly obtain the expression profile of any sequence in these datasets: splice variant, fusion, mutation, non-coding RNA, etc. Because RNA-seq data is extremely voluminous, this is not as easy as it might seem. For example, if you take 1000 RNA-seq files, that’s ~10 times the size of Genbank. This means that tools like Blast are out of the game, and that’s where k-mer indexes like Reindeer come in. These indexes are 15-40 times smaller than the original files (fastq.gz) and allow the query of dozens of sequences/second. The main focus of our project is the expression of non-referenced RNAs in annotations and we have shown on the CCLE dataset (@DepMapSanger) that we can precisely quantify mutations and fusions from specific ‘probes’.

Transipedia should enable clinicians and biologists to explore patient RNA-seq data rapidly and quantify targeted transcriptional events such as biomarker expression, questions that could only be addressed today by complex pipelines.

The main perspectives of this work are (i) to make a growing number of public RNA-seq datasets available to the scientific community and (ii) to create an encyclopedia of query sequences (‘probes’) for the main alterations in numerous cancers (panel of mutations/fusions).

You can play with Transipedia and your own sequences here:
https://transipedia.org
and get inspired by our examples:
https://github.com/Transipedia/Reindeer-use-cases

Discover the published article

Genome Biol. 2024 Oct 10;25(1):266.doi: 10.1186/s13059-024-03413-5.PMID: 39390592 PMCID: PMC11468207 DOI: 10.1186/s13059-024-03413-5
Chloé Bessière, Haoliang Xue, Benoit Guibert, Anthony Boureux, Florence Rufflé, Julien Viot, Rayan Chikhi, Mikaël Salson, Camille Marchet, Thérèse Commes, Daniel Gautheret
Transipedia.org: k-mer-based exploration of large RNA sequencing datasets and application to cancer data

 

Collaborations et partnerships

Collaborations  :

équipe Bio2m – Pr. T COMMES (IRMB, Montpellier), Dr. D GAUTHERET (I2BC, Paris-Saclay), Dr. C MARCHET (CRIStAL, Université de Lille), Dr. R CHIKHI (Pasteur, Paris)

Founding :

  • Agence Nationale de la Recherche (ANR-18-CE45-0020, ANR-22-CE45-0007, ANR-19-CE45-0008, PIA/ANR16-CONV-0005, ANR-19-P3IA-0001)
  • Immun4Cure IHU “Institute for innovative immunotherapies in autoimmune diseases”
  • European Union’s Horizon 2020 research and innovation program, Marie Skłodowska-Curie grant
  • Fondation de France
Centre de Recherches en Cancérologie de Toulouse

Toulouse Cancer Research Center (Oncopole)

Toulouse - FR

Contact us

+33 5 82 74 15 75

Want to join
the CRCT team ?

Pin It on Pinterest