The web interface of the ReproGenomics Viewer database was built with the web framework Django.
Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
The full source code of the ReproGenomics Viewer web application is available here.
The backbone of the RGV is the series of tools for processing and organizing data within the system.
The system integrates an implementation of the JBrowse genome browser grafted to the RGV backbone.
Four types of information are manually extracted and curated for each study, including:
- 1- the scientific name of each species and the genome release with which the experiments were performed and analyzed;
- 2- the associated scientific publication;
- 3- each biology topic investigated in the study;
- 4- and the high-throughput technologies performed. Then each raw data file of a given data set underwent a series of automatic processing to make it fully compatible with the RGV system.
For each sample, eight processing steps are sequentially performed:
- 1- The raw data file (sra file) of a given sample and their associated information are downloaded from the Gene Expression Omnibus (GEO) and to the Sequence Read Archive (SRA);
- 2- The fastq file is extracted from the sra archive file;
- 3- reads are mapped on its corresponding reference genome using the STAR tool.
- 4- each bam file is converted into a simple tab-delimited text file (BED) in which data are standardized according to the total nuber of read counts using BEDTools;
- 5- the standardized BED file is then converted into an indexed binary format (bigWig or bw) to enable fast remote access to the data;
- 6- the pairwise alignments between genome assemblies and between species provided by UCSC made it possible to convert genome coordinates in the resulting bigWig file from a genome of a given species into a genome of another species ussing CrossMap.
- 7- reference transcripts are quantified in the corresponding sample using the StringTie tool.
- 8- We used manually extracted information to organize the processed data into four broad categories, i.e. biological topics, technologies, publications and species.