Perl API Installation

Introduction

All data sets in the Ensembl system are stored in relational databases (MySQL). For each of the Ensembl databases the project provides a specific Perl API. As Ensembl takes also advantage of code provided by the BioPerl project; installation of the BioPerl package is included in these instructions. The Ensembl API is compatible with Perl version 5.14 through to 5.26.

Video Tutorial

Ensembl has produced a video tutorial about how to install the API. Its content is based on this document so you can follow both resources when performing an installation. All commands in this video can be found from the following document on our FTP site.

YouTube channel

Installation Procedure

There are two ways of installing the Perl API. You can clone it from GitHub using Git if you have that available, or you can download the files in gzipped TAR format from our FTP site. You will also need BioPerl 1.6.924 core modules (bioperl-live).

N.B. We recommend waiting until a few days after a release before downloading the new API (or re-downloading after a few days), as there may be post-release bug fixes added to the code.

  1. Create an installation directory and download the distributions:

    $ cd
    $ mkdir src
    $ cd src
    $ wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/ensembl-api.tar.gz
    $ wget https://github.com/bioperl/bioperl-live/archive/release-1-6-924.zip
    
  2. Unpack the downloaded files. In the Unix command line, type:

    $ tar zxvf ensembl-api.tar.gz
    $ unzip release-1-6-924.zip
    

    In Windows, you will need an unzipping utility such as 7-Zip.

  3. Rename the bioperl-live directory. In the Unix command line, type:

    $ mv bioperl-live-release-1-6-924 bioperl-1.6.924
    

    In classic Windows command line, use ren instead of mv.

  4. Set up your environment

    You have to tell Perl where to find the modules you just installed. You can do this by using the use lib clause in your script but if you want to make these modules available for all your scripts, the best way is to add them into the PERL5LIB environment variable.

    • Under bash, ksh, or any sh-derived shell:

      PERL5LIB=${PERL5LIB}:${HOME}/src/bioperl-1.6.924
      PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl/modules
      PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-compara/modules
      PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-variation/modules
      PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-funcgen/modules
      export PERL5LIB
      
    • Under csh or tcsh:

      setenv PERL5LIB ${PERL5LIB}:${HOME}/src/bioperl-1.6.924
      setenv PERL5LIB ${PERL5LIB}:${HOME}/src/ensembl/modules
      setenv PERL5LIB ${PERL5LIB}:${HOME}/src/ensembl-compara/modules
      setenv PERL5LIB ${PERL5LIB}:${HOME}/src/ensembl-variation/modules
      setenv PERL5LIB ${PERL5LIB}:${HOME}/src/ensembl-funcgen/modules
      
    • Under Windows (assuming you installed the APIs in C:\src\):

      set PERL5LIB=C:\src\bioperl-1.6.924;C:\src\ensembl\modules;C:\src\ensembl-compara\modules;C:\src\ensembl-variation\modules;C:\src\ensembl-funcgen\modules
      
    • In Perl (we do not recommend creating hard-coded dependencies in Perl scripts):

      use lib "$ENV{HOME}/src/bioperl-1.6.924";
      use lib "$ENV{HOME}/src/ensembl/modules";
      use lib "$ENV{HOME}/src/ensembl-compara/modules";
      use lib "$ENV{HOME}/src/ensembl-variation/modules";
      use lib "$ENV{HOME}/src/ensembl-funcgen/modules";
      	

Variation genotype and frequency data

To retrieve genotype, frequency and linkage disequilibrium (LD) data for 1000 Genomes phase 3 variants, it is necessary to install a couple of extra dependencies:

  1. Bio-DB-HTS and perl module:

    cd ~/src
    git clone --branch master --depth 1 https://github.com/samtools/htslib.git
    cd htslib
    make
    export HTSLIB_DIR=${HOME}/src/htslib/
    cd ..
    
    git clone https://github.com/Ensembl/Bio-DB-HTS.git
    cd Bio-DB-HTS
    perl Build.PL
    ./Build
    export PERL5LIB=$PERL5LIB:${HOME}/src/Bio-DB-HTS/lib:${HOME}/src/Bio-DB-HTS/blib/arch/auto/Bio/DB/HTS/:${HOME}/src/Bio-DB-HTS/blib/arch/auto/Bio/DB/HTS/Faidx
    cd ..
    
    cd ensembl-variation/C_code/
    make
    cd ../../
    
    

    Set up environment; use the path output from the "make && make install" command for the PERL5LIB variable, e.g.

    PERL5LIB=${PERL5LIB}:${HOME}/src/lib/perl/5.14.4/
    export PERL5LIB
    
      
  2. ensembl-io perl modules (only if you didn't use Git Ensembl tools to install the API):

    cd ~/src
    wget https://github.com/Ensembl/ensembl-io/archive/release/111.zip
    unzip 111.zip
    mv ensembl-io-release-111 ensembl-io

    Add this to PERL5LIB.

    • Under bash, ksh, or any sh-derived shell:

      PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-io/modules
      export PERL5LIB
    • Under csh or tcsh:

      setenv PERL5LIB ${PERL5LIB}:${HOME}/src/ensembl-io/modules

Non-vertebrates

If you are working with non vertebrate genomes, you will also need the ensembl-metadata modules (only if you didn't use Git Ensembl tools to install the API):

cd ~/src
wget https://github.com/Ensembl/ensembl-metadata/archive/release/111.zip
unzip 111.zip
mv ensembl-metadata-release-111 ensembl-metadata

Add this to PERL5LIB.

  • Under bash, ksh, or any sh-derived shell:

    PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-metadata/modules
    export PERL5LIB
  • Under csh or tcsh:

    setenv PERL5LIB ${PERL5LIB}:${HOME}/src/ensembl-metadata/modules

Debugging an Installation

Sometimes installations can go wrong. You should follow our debugging installation guide to help diagnose and resolve installation issues.

Tips for Windows and Mac Users

Ensembl can be installed on both Windows and Mac machines however installation is not as straightforward as installing on Linux. We recommend you consult our two blog posts detailing how you can install Ensembl on Windows and on OSX. The fastest way to get up and running with Ensembl on these operating systems is to use our virtual machine.