EnsEMBL Registry

Introduction

The Registry system allows to tell your programs where to find the EnsEMBL databases and how to connect to them.

Use

The following call will load all the "latest" databases from the public EnsEMBL MySQL server ensembldb.ensembl.org:

Bio::EnsEMBL::Registry->load_registry_from_db(
  -host    => 'ensembldb.ensembl.org',
  -user    => 'anonymous',
  -verbose => '1'
);

Port and Password are the other allowed parameters to this subroutine. The latest database is the one with the highest release for each species.

Alternatively, a custom registry configuration file could be typically loaded at the beginning of the script using the command:

Bio::EnsEMBL::Registry->load_all();

This method loads the Registry from the configuration file passed as an argument. If no argument is supplied, it tries to use the file defined in the environment variable ENSEMBL_REGISTRY. It uses the file .ensembl_init if all the previous fail.

Registry Configuration File

The Registry configuration file for the Perl API is a Perl file which defines the DBAdaptors you will need in your scripts. You should start with a:

use strict;

clause. You will have to import some modules:

use Bio::EnsEMBL::Utils::ConfigRegistry;
use Bio::EnsEMBL::DBSQL::DBAdaptor;
use Bio::EnsEMBL::Compara::DBSQL::DBAdaptor;

The first one will allow you to define some aliases for the databases. The second module is needed if you want to configure EnsEMBL core databases and the third one is needed for the EnsEMBL Compara databases. You may need other DBAdaptors for connecting to an EnsEMBL Variation database for instance.

Next, you have to declare your DBAdaptors. For each database you will need to create a new object: Bio::EnsEMBL::DBSQL::DBAdaptor objects for the core database, Bio::EnsEMBL::Compara::DBSQL::DBAdaptor objects for compara databases and so on. You will have to define the database host, the port (3306 is the default value), the name of the database, the type of database (core, compara, variation...) and the species to which this database refers. You may also add some aliases of the name using the Bio::EnsEMBL::Utils::ConfigRegistry module. Here is an example for the public human core database (release 70):

new Bio::EnsEMBL::DBSQL::DBAdaptor(
  -host    => 'ensembldb.ensembl.org',
  -user    => 'anonymous',
  -port    => '3306',
  -species => 'homo_sapiens',
  -group   => 'core',
  -dbname  => 'homo_sapiens_core_70_37'
);

my @aliases = ( 'H_Sapiens', 'Homo sapiens', 'human' );

Bio::EnsEMBL::Utils::ConfigRegistry->add_alias(
  -species => 'homo_sapiens',
  -alias   => \@aliases
);

From release to release, you will have to change the dbname parameter. In order to find out the exact name of the new database you can use the "SHOW DATABASES" command of MySQL:

shell> mysql -u anonymous -h ensembldb.ensembl.org -P 3306
mysql> SHOW DATABASES LIKE "homo_sapiens_core_%";

The species name can be whatever you want and you may add as many aliases as you want, BUT:

  1. You should not have two databases with the same name or alias.
  2. If you intend to use the compara API, you need to use the species' production name. This is normally the binomial name lowercased with spaces replaced by underscores e.g. Homo sapiens should be homo_sapiens. Check your database's meta table for the meta key species.production_name

For connecting to the EnsEMBL Compara database, you will have to create a Bio::EnsEMBL::Compara::DBSQL::DBAdaptor. Here is an example:

new Bio::EnsEMBL::Compara::DBSQL::DBAdaptor(
  -host    => 'ensembldb.ensembl.org',
  -user    => 'anonymous',
  -port    => 3306,
  -species => 'compara',
  -dbname  => 'ensembl_compara_70'
);

@aliases = ( 'ensembl_compara_70', 'compara70', 'compara' );

Bio::EnsEMBL::Utils::ConfigRegistry->add_alias(
  -species => 'Compara70',
  -alias   => \@aliases
);

Finally, you have to end with a 1 for the import to be successful:

1;

Save the File

If you want this file to be your default configuration file, you probably want to save it as .ensembl_init in your home directory. You can also save it elsewhere and point the ENSEMBL_REGISTRY environment variable to that location. Here are a couple of examples of how to configure your environment depending on your shell:

  • Under bash:
    ENSEMBL_REGISTRY="/usr/local/share/ensembl_registry.conf"
    export ENSEMBL_REGISTRY
    
  • Under csh or tcsh:
    setenv ENSEMBL_REGISTRY "/usr/local/share/ensembl_registry.conf"
    

EnsEMBL Software Support

EnsEMBL is an open project and we would like to encourage correspondence and discussions on any subject on any aspect of EnsEMBL. Please see the EnsEMBL Contacts page for suitable options for getting in touch with us.

More details

Full detailed documentation on the Registry itself is available. Use of methods not outlined in the tutorials should be considered advanced usage.