Assembly Converter setup instructions
Our current version of the Assembly Converter uses CrossMap.
System requirements
CrossMap is a Python script which has been designed to run on Linux and OSX - for the latter you will need to install XCode command line tools.
Installation and setup
- Install Python and prerequisite packages (including gcc, numpy and cython), and download the CrossMap script - full details are available on the CrossMap project page.
- Copy the CrossMap script to a location where python can find it. It's a good idea at this point to run CrossMap from the command-line, to test that your basic setup is correct.
- In public-plugins/mirror/SiteDefs.pm, configure the locations of your
CrossMap files, e.g.
$SiteDefs::ASSEMBLY_CONVERTER_BIN_PATH = 'usr/local/bin/python/CrossMap.py'; $SiteDefs::ENSEMBL_CHAIN_FILE_DIR = '/usr/local/ensembl/tools_data/assembly_converter';
- Download the pre-generated chain files from our FTP site and put them in the above directory, making sure to keep them in their per-species subdirectories. Note: DO NOT UNZIP THEM.
- If you wish to input/output VCF files, you will also need the toplevel FASTA files
for each assembly that you wish to convert. Example download links for human are:
- GRCh38: https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/release-76/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
- GRCh37: https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz
- NCBI36: https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacteria/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz
(If you are also using the VEP, you could instead symlink to the FASTA files in the VEP cached directories, to save space).
These files will need to be put in the same directory as the chain files and UNZIPPED.
Your completed data structure should look something like this:
`-- /usr/local/ensembl/tools_data/assembly_converter |-- homo_sapiens |-- GRCh37_to_GRCh38.chain.gz |-- GRCh38_to_GRCh37.chain.gz |-- Homo_sapiens.GRCh37.dna.toplevel.fa |-- Homo_sapiens.GRCh38.dna.toplevel.fa |-- mus_musculus |-- GRCm38_to_NCBIM37.chain.gz |-- Mus_musculus.GRCm38.dna.toplevel.fa |-- Mus_musculus.NCBIm37.dna.toplevel.fa |-- NCBIM37_to_GRCm38.chain.gz
- You should now restart -r your webserver to pick up the configuration changes.