Services: Genes and genomes

Name of service Tag Related links* Key Collection
Ocean Gene Atlas

The Ocean Gene Atlas service provides data mining access to three complementary data objects: gene sequence catalogs (ENA), sample environmental context (PANGAEA), and gene abundances estimates in samples (computed by mapping raw sequence reads onto gene catalogs). User queries are composed of either a sequence (nucleic or protein), or a hidden Markov model derived from a multiple sequence alignment. Homologs of the user query in the gene catalogs are identified using standard sequence similarity search tools (eg BLAST or HMMER), and their read based estimated abundance are displayed in interactive world maps and ecological plots. A phylogenetic tree is also inferred in order to situate the user query within its context of marine environmental homologs as well as known homologs from reference sequences.

OMA

OMA identifies orthologs among 2000 genomes from all domains of life. Other distinctive characteristics are the high quality of its inferences, the feature-rich web interface, and frequent update schedule of two releases per year. 

ORCAE

Online collaborative genome annotation resource offering a range of tools and information to validate and correct gene annotations.

Orphadata

Orphadata provides the scientific community with comprehensive, quality data sets related to rare diseases and orphan drugs from the Orphanet knowledge base, in reusable formats.

CDD
Orphanet

Orphanet is the reference resource for information and data on rare diseases and orphan drugs. Orphanet derives from its knowledge base an ontology of rare diseases, information on rare diseases and data on rare diseases.

OrthoDB

OrthoDB is a comprehensive catalog of evolutionary and functional annotations of orthologs, covering over 22 million genes from over 5000 species of animals, fungi, plants, archaea, bacteria, and viruses. 

ParameciumDB

ParameciumDB is a community model organism database for the ciliate Paramecium. The web site gives access to genomes of many Paramecium species and their annotations. ParameciumDB also  integrates  genome-wide datasets (DNA-seq, RNA-seq, ChIP-seq) provided by the community. This portal is used to query, retrieve, visualize and compare the most up-to-date public data.

PatSearch

To searches user submitted sequences for any combination of Position Weight Matrices (PWMs), primary sequence patterns and structural motifs.

PHI-base

A catalogue of experimentally-verified pathogenicity, virulence and effector genes involved in the infection of animal, plant, fungal and/or insect hosts.

PhyML

PhyML is a software that estimates maximum likelihood phylogenies from alignments of nucleotide or amino acid sequences. The main strength of PhyML lies in the large number of substitution models coupled to various options to search the space of phylogenetic tree topologies, going from very fast and efficient methods to slower but generally more accurate approaches. PhyML was designed to process moderate to large data sets. In theory, alignments with up to 4,000 sequences 2,000,000 character-long can be processed. PhyML can process data sets made of multiple genes and fit sophisticated substitution models with heterogeneous components across partition elements.

PiCnIc

Pipeline for Cancer Inference

A pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes.

PIPPA

Web-interface and database providing tools for the management of different plant phenotyping platforms, and the analysis of images and data.

PlantsDB

Providing a data and information resource for individual plant species

PLAZA

Plant-oriented online resource for comparative, evolutionary and functional genomics.

RIR
PredictSNP

Tool for prediction of disease related mutations in proteins. Tool version 2 (PredictSNP 2) for prediction of disease related mutatins within human genome available since 2016.

Primer3

Primer3 is a program for designing PCR primers and oligos. 

RAP

RNA-Seq Analysis Pipeline

A cloud computing web application implementing a complete and modular RNA-Seq analysis workflow.

ReadXplorer

Exploring and evaluating NGS data utilizing a modular programming structure allowing easy plugins.

REDIdb

A database annotating organellar RNA editing processes in their biological context.

REDIportal

A database of RNA editing events in humans from RNA-Seq and DNA-Seq data.

REDItools

Python scripts developed with the aim to study RNA editing at genomic scale by next generation sequencing data.

RepeatExplorer

Set of tools and a web server for complex characterization of repetitive DNA based on data from next generation of sequence reads.

REPET

The REPET package integrates bioinformatics pipelines dedicated to detecte, annotate and analyse transposable elements (TEs) in genomic sequences. The main pipelines are (i) TEdenovo, which search for interspersed repeats, build consensus sequences and classify them according to TE features, and (ii) TEannot, which mines a genome with a library of TE sequences, for instance the one produced by the TEdenovo pipeline, to provide TE annotations exported into GFF3 files.

Rfam

The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).

RNA Galaxy Workbench

Providing access to many NGS and RNA tools, visualisations, interactive environments (e.g. IPython) as well as various utilities, reference genomes and data libraries.

RNA-seq end-to-end workflow

End-to-end gene-level RNA-Seq differential expression workflow using Bioconductor packages. Starting from the FASTQ files are aligned to the reference genome, and a count matrix which tallies the number of RNA-seq reads/fragments within each gene for each sample is prepared. Performance of exploratory data analysis (EDA) for quality assessment and exploration of the relationship between samples, performance of differential gene expression analysis, and visual exploration of the results.

Roddy

Roddy is a framework for large scale NGS processing pipelines on Petabyte scale. It is used for the management of workflows in the Pan-Cancer Analysis of Whole Genomes (PCAWG) project.

rPredictor

Web tool for prediction of rRNA secondary structures.

SalmoBase

A comprehensive data resource for salmonids species based on different omics data

SARS-CoV-2 DB

A database with high-quality curated and freely accessible SARS-CoV-2 genomics- and contextual resources.