Each study available in RGV was manually curated to extract relevant information:
- Information relative to the associated publication (PubMed ID, authors, full publication name, abstract)
- The species investigated and the technology used
- Relevant information about investigated samples (tissues, sex, developpemental stage, ...)
- Any relevant keywords
If available, the raw data associated with the study will then be processed using a multiple-step workflow to make it compatible with the RGV visualization tools.
For each sample, seven processing steps are sequentially performed:
- 1- The raw data file (sra file) of a given sample and their associated information are downloaded from the Gene Expression Omnibus (GEO) and to the Sequence Read Archive (SRA);
- 2- The fastq file is extracted from the sra archive file;
- 3- Reads are mapped on its corresponding reference genome using the STAR tool.
- 4- Each bam file is converted into a simple tab-delimited text file (BED) in which data are standardized according to the total nuber of read counts using BEDTools;
- 5- The standardized BED file is then converted into an indexed binary format (bigWig or bw) to enable fast remote access to the data;
- 6- The pairwise alignments between genome assemblies and between species provided by UCSC made it possible to convert genome coordinates in the resulting bigWig file from a genome of a given species into a genome of another species ussing CrossMap.
- 7- Reference transcripts are quantified in the corresponding sample using the StringTie tool.
Framework and plugins
The web interface of the ReproGenomics Viewer database was built with the web framework Django.
All charts generated by RGV's tools use the plotly.js plugin.
The charts layouts are managed using Golden layout
The full source code of the ReproGenomics Viewer web application is available here.