Upcoming Ensembl Platform Transition

This is the final release of its kind on this website.

In summer 2026, this site will bring you to the new Ensembl platform currently at beta.ensembl.org.
Please bookmark this archive to retain access to the current site, tools and functionality until they are available on the new platform -> eg63-bacteria.ensembl.org

Repeat feature annotation

If repeat data is present in INSDC when a genome is loaded, then those features are imported into Ensembl Genomes. For bacterial genomes, this is currently the only source of repeat data. For other divisions, a computational pipeline is additionally run, to annotate three types of repeat:

  • Low-complexity regions (Dust [1])
  • Tandem repeats (TRF [2])
  • Complex repeats (RepeatMasker [3])

Annotating repeats with RepeatMasker requires a repeat library. In most cases, a species-specific library is not available, so the RepBase [4] database of eukaryotic repetitive elements is used. Repeat libraries from the following sources are used and combined where possible:

Viewing and accessing repeat features

By default, repeat features are not displayed in the genome browser; display them by using the Configure this page option. You can view all repeats, or a subset of repeats based on type.

The repeat annotations can be programatically accessed using the Ensembl API. See the RepeatFeature and RepeatFeatureAdaptor documentation for further details.

For Ensembl Plants species only, tandem repeats annotated by the TRF program are not used to soft- and hardmask the genome sequences.

References

  1. Morgulis A et al. (2006) A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol. 13:1028-40
  2. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27: 573-580
  3. Smit AFA, Hubley R, Green P (1996-2010) RepeatMasker Open-3.0 https://www.repeatmasker.org
  4. Jurka J et al. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110:462-467