Search for a gene

e.g. ftsZ or uridine*

Search for a genome

e.g. type esc to find Escherichia

Access to over 40,000 Bacterial Genomes

What's New in Release 35

Did you know...?

Ensembl Genomes REST ServiceTo access Ensembl Genomes data from any programming language, try our REST service. For full documentation including examples from a wide range of languages, visit http://rest.ensemblgenomes.org

Release 35 of EnsemblBacteria has been loaded from EMBL-Bank release 130. The current dataset contains 44,039 genomes (43,552 bacteria and 494 archaea) from 8244 species containing 155,251,477 protein coding genes loaded from 4,922,506 INSDC entries. This release includes 2,460 new genomes, 188 genomes with updated assemblies, 234 genomes with updated annotation, 769 genomes where the assigned name has changed, and 31 genomes removed since the last release. The current database schema is Ensembl v88.

Ensembl Bacteria

Ensembl Bacteria hosts the annotation of over 44,000 prokaryotic (bacterial and archaeal) genomes that have been submitted to the databases of the International Nucleotide Sequence Database Collaboration (INSDC), i.e. the European Nucleotide Archive at the EBI, GenBank at the NCBI, and the DNA Database of Japan.

Non-redundant genomes

The ENA houses over 90,000 prokaryotic genome assemblies, including multiple strains of many species. To reduce redundancy, we have adopted a policy (as of release 35 (April 2017)) of only loading in new sequences that are relatively non-redundant with the existing data set, according to the criteria of the UniProt Knowledgebase. All strains that were present in the INSDC archives prior to this release have already been included in Ensembl Bacteria (regardless of whether they meet the new criteria) and will remain available in future.

Data access

Data can be visualised through the Ensembl genome browser and accessed programmatically via our Perl and RESTful APIs. Data is also accessible through public MySQL databases and our FTP site containing full data dumps in FASTA, EMBL, GTF, GFF3, JSON and RDF formats.

There are no BioMarts currently available for Ensembl Bacteria,, but we are developing new, more powerful data mining tools. A selection of over 100 key bacterial genomes has been included in the pan-taxonomic Compara, and genes from all genomes have been classified into families using HAMAP and PANTHER (more details).

Citing Ensembl Genomes

If you've used Ensembl Genomes in your work, please cite the most recent overview article below and the Ensembl Genomes release you retrieved your data from. References for the specific genome assembly can be found on the More information and statistics page for each species (e.g., Escherichia coli).

Kersey PJ, et al. Ensembl Genomes 2016: more genomes, more complexity. Nucleic Acids Res. 2016 Jan;44(D1) D574-80. doi:10.1093/nar/gkv1209. PMID: 26578574; PMCID: PMC4702859.

Ensembl Genomes is developed by EMBL-EBI and is powered by the Ensembl software system for the analysis and visualisation of genomic data. For details of our funding please click here. For information on how to cite Ensembl Genomes click here.