VCF to PED Converter

VCF to PED Converter

The VCF to PED converter allows users to parse a vcf file (specification) to create a linkage pedigree file (ped) and a marker information file, which together may be loaded into ld visualization tools like Haploview. There is both an online version of this tool and a perl script


Online version

The documentation for the online version can be found by clicking on the icon at the top of VCF to PED converter.


API script

A perl API script version of the converter tool is available from the ftp site.

This script converts locally or remotely accessible vcf files to linkage pedigree files. If the input file is only remotely accessible then it must be compressed by bgzip and indexed by tabix. There is no requirement to compress vcf files if they are held locally, but large files will be read more quickly using tabix. If the vcf file is compressed then you must have tabix installed.

The script is run from the command line and it takes the following arguments:

-vcf (required argument)
Path to a locally or remotely accessible tabix indexed vcf file.

-sample_panel_file (required argument)
Path to a locally or remotely accessible sample panel file, listing all individuals (first column)and their population (second column).

-population (required argument)
A population name, which must appear in the second column of the sample panel file. Can be specified more than once for multiple populations.

-tabix (optional argument)
Path to the tabix executable. If the vcf file is compressed and this argument is not specified, the default is to search PATH for ‘tabix’.

-output_ped (optional argument)
Name of the output ped file. The default name is region.ped (e.g. 1_100000-100500.ped).

-output_info (optional argument)
Name of the output info file (marker information file). The default name is region.info (e.g. 1_100000-100500.info).

-output_dir (optional argument)
Name of a directory in which to put the output files.

-base_format (optional argument(number or letter))
Defaults to number; if letter is specified the genotypes will be expressed as ATGC rather than 0123, by default this script uses the old style of plink allele annotation which used A => 1, C => 2, G => 3 and T => 4

-help (optional argument)
Print out the help documentation.

Here is an example of a command line for running the script:

perl vcf_to_ped_converter.pl -vcf https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr13.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz -sample_panel_file https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/phase1_integrated_calls.20101123.ALL.sample_panel -region 13:32889611-32973805 -population GBR -population FIN