Search for a gene
Search for a genome
e.g. type esc to find Escherichia
What's New in Release 49
Release 49 of Ensembl Bacteria had a major update of all of its species. All the bacterial genomes were freshly reloaded from ENA. To help with scalability,we filtered redundant proteomes following UniProt criteria, reducing our total number of bacterial genomes to 31,332. See more details about this update in our blog. Ensembl Bacteria has an updated pan-taxonomic compara (which includes key bacterial species).
New and updated genomes
- A total of 31,332 bacterial and archaeal genomes. This includes 22,088 new genomes including 28 new strains of Bacteroides vulgatus, a bacterium highly prevalent in the human gastrointestinal microbiota and 16 new strains of Prevotella copri, intestinal anaerobic bacterium correlated with the development of rheumatoid arthritis.
- 567 genomes have been renamed in the NCBI taxonomy database since our last update. In particular, 6 species that have ben renamed in pan-taxonomic compara.
- 34,804 genomes have been removed (mostly due to them being marked as redundant by UniProt). In particular, 15 species that used to be in pan-taxonomic compara are now removed.
- Annotation of pathogen-host interaction data (PHI-base version 2019-09-16).
- Alignments to Rfam covariance models (Rfam 12.2) visible in new track (‘Rfam models’).
- Updated protein features for all species using InterProScan with version 77.0 of InterPro.
- Bacterial species names used within our production processes now
have the assembly accession as a suffix (e.g.
streptococcus_pneumoniae_tigr4 is now named
streptococcus_pneumoniae_tigr4_gca_000006885). Please amend any stored bookmarks for species pages.
Archive of release 45 of EnsemblBacteria: eg45-bacteria.ensembl.org (Sep 2019)
Archive of release 40 of EnsemblBacteria: eg40-bacteria.ensembl.org (July 2018)
Archive of release 37 of EnsemblBacteria: eg37-bacteria.ensembl.org (October 2017)
Ensembl Bacteria is a browser for bacterial and archaeal genomes. These are taken from the databases of the International Nucleotide Sequence Database Collaboration, the European Nucleotide Archive at the EBI, GenBank at the NCBI, and the DNA Database of Japan).
As of release 35 (April 2017), we have only integrated new sequences that are non-redundant when compared to the existing data set, according to the criteria of the UniProt Knowledgebase (DOI: 10.1093/database/baw139). From release 49, we are only hosting non-redundant prokaryotic genomes. All existing data will continue to be available via the archive sites.
Data can be visualised through the Ensembl genome browser and accessed programmatically via our Perl and RESTful APIs. Data is also accessible through public MySQL databases and our FTP site containing full data dumps in FASTA, EMBL, GTF, GFF3, JSON and RDF formats. A selection of over 100 key bacterial genomes have been included in the pan-taxonomic compara, and genes from all genomes have been classified into families using HAMAP and PANTHER more details.