Upcoming Ensembl Platform Transition

This is the final release of its kind on this website.

In summer 2026, this site will bring you to the new Ensembl platform currently at beta.ensembl.org.
Please bookmark this archive to retain access to the current site, tools and functionality until they are available on the new platform -> eg63-bacteria.ensembl.org

Protein feature annotation

The InterProScan 5 pipeline [1] is used to annotate the translations of gene models for each genome. The pipeline scans sequences against InterPro [2] signatures to identify protein families and domains. InterPro signatures are predictive models, provided by the different databases that make up the InterPro consortium. In addition, coiled coils, signal peptides, transmembrane domains, and low complexity regions are annotated with ncoils, SignalP, TMHMM, and seg, respectively.

The InterPro families and domains are often associated with Gene Ontology (GO) terms and pathways, and this information is loaded when protein features are annotated. For example, the IPR013483 family is linked, by InterPro, to two GO terms and a KEGG pathway reference; consequently, these will be added as cross-references to any translation that is annotated with the IPR013483 family.

In the genome browser, protein features are displayed on the 'Splice variants' page (when viewing a gene) and on the 'Protein summary' and 'Domains & features' pages (when viewing a transcript).

References

  1. Jones P et al. (2011) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236-1240
  2. Mitchell A et al. (2014) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res.