For full documentation, including examples from a wide range of languages, visit

Release 49 of Ensembl Bacteria had a major update of all of its species. All the bacterial genomes were freshly reloaded from ENA. To help with scalability,we filtered redundant proteomes following UniProt criteria, reducing our total number of bacterial genomes to 31,332. See more details about this update in our blog. Ensembl Bacteria has an updated pan-taxonomic compara (which includes key bacterial species).

  • New and updated genomes

    • A total of 31,332 bacterial and archaeal genomes. This includes 22,088 new genomes including 28 new strains of Bacteroides vulgatus, a bacterium highly prevalent in the human gastrointestinal microbiota and 16 new strains of Prevotella copri, intestinal anaerobic bacterium correlated with the development of rheumatoid arthritis.
  • Renamed genomes

    • 567 genomes have been renamed in the NCBI taxonomy database since our last update. In particular, 6 species that have ben renamed in pan-taxonomic compara.
  • Removed genomes

    • 34,804 genomes have been removed (mostly due to them being marked as redundant by UniProt). In particular, 15 species that used to be in pan-taxonomic compara are now removed.
  • Updated data

    • Annotation of pathogen-host interaction data (PHI-base version 2019-09-16).
    • Alignments to Rfam covariance models (Rfam 12.2) visible in new track (‘Rfam models’).
    • Updated protein features for all species using InterProScan with version 77.0 of InterPro.
    • Bacterial species names used within our production processes now have the assembly accession as a suffix (e.g. streptococcus_pneumoniae_tigr4 is now named
      streptococcus_pneumoniae_tigr4_gca_000006885). Please amend any stored bookmarks for species pages.

Archive sites

Archive of release 45 of EnsemblBacteria: (Sep 2019)

Archive of release 40 of EnsemblBacteria: (July 2018)

Archive of release 37 of EnsemblBacteria: (October 2017)

Ensembl Bacteria

Ensembl Bacteria is a browser for bacterial and archaeal genomes. These are taken from the databases of the International Nucleotide Sequence Database Collaboration, the European Nucleotide Archive at the EBI, GenBank at the NCBI, and the DNA Database of Japan).

Non-redundant genomes

As of release 35 (April 2017), we have only integrated new sequences that are non-redundant when compared to the existing data set, according to the criteria of the UniProt Knowledgebase (DOI: 10.1093/database/baw139). From release 49, we are only hosting non-redundant prokaryotic genomes. All existing data will continue to be available via the archive sites.

Data access

Data can be visualised through the Ensembl genome browser and accessed programmatically via our Perl and RESTful APIs. Data is also accessible through public MySQL databases and our FTP site containing full data dumps in FASTA, EMBL, GTF, GFF3, JSON and RDF formats. A selection of over 100 key bacterial genomes have been included in the pan-taxonomic compara, and genes from all genomes have been classified into families using HAMAP and PANTHER more details.

Ensembl Genomes is developed by EMBL-EBI and is powered by the Ensembl software system for the analysis and visualisation of genomic data. For details of our funding please click here.

