SCIMP: Scan for Imperfect Phylogenies
This webserver finds unrooted maximum parsimony phylogeny and can scan any region of the genome to detect large imperfection (recurrent mutations or recombinations).
This webserver finds UNROOTED maximum parsimony phylogeny under the assumption that all point mutations are equally likely.
Recommended HapMap window size: 5 upto including 9 (this is pre-processed). Maximum recommended number of varying SNPs for imperfect phylogeny reconstruction 50.
Input options:
Users can either enter their own SNP-haplotype data (see format below) or use the data-set published by the International HapMap consortium.
User-specified SNP Data: The file is white-space and endline ('\n') delimited. The first two numbers (white-space and/or endline delimited) should be the number of haplotype sequences and the number of SNPs respectively. Each subsequent line corresponds to one haplotype sequence which is white-space delimited. Each site should either take the value 0 (one allele) or 1 (other allele), no missing data allowed. In short, the set of haplotypes should form an n x m binary matrix, where n is the number of rows (number of sequences) and m is the number of columns(number of sites). Examples (also provided in the main page):
small
,
larger
,
harder
HapMap Data: Currently we support analysis of two populations CEU and YRI. The set of contiguous SNPs that the user is interested in can be specified in three ways.
Genotyped SNP Number: This corresponds to the i-th SNP as genotyped by the HapMap project. Notice that these numbers are consecutive, starting from 1 and ending at the number of SNPs genotyped for that chromosome.
Chromosome Position: The physical location of the site. If the site was not genotyped, the first genotyped SNP larger than the starting value and the last SNP smaller than the ending value is used. The easiest way to browse the imperfection is to first use the genotyped SNP number option with a pre-processed window size. Then, using the output as a guide to focus on the SNPs physical chromosome position.
Analysis:
The user has two options to analyze the data. The first is to scan a region of the genome to identify regions of high imperfectness (recurrent mutations/recombinations).
Scan: The user specifies a window size. A sliding window of SNPs is used to compute the most parsimonious phylogeny and therefore the imperfectness (number of recurrent mutations) of the region. The output produces an imperfectness scan for the region under study. An imperfectness of k indicates that the region MUST have undergone k recurrent mutations (in the absence of recombinations).
Imperfection: This simply reconstructs the most parsimonious phylogeny for the specified sequences. Note that unlike heuristics that
attempt
to maximize parsimony, our method is
guaranteed
to return the most parsimonious phylogeny. In otherwords, our results should be similar to exhaustive search (brute-force) or branch-and-bound but significantly faster.
Phylogeny: Each green vertex corresponds to one or more input rows which is indicated by the number within the vertex (see below for HapMap). Each blue vertex corresponds to a Steiner (ancestral) vertex, that is simply used to link the input rows. Each edge denotes one or more mutations and is annotated with the site(s) that mutate. If the HapMap is used for scan, then in the resulting phylogenies the individuals are numbered from 0 through 119. The individual IDs (in order) are given in these files. The IDs appear twice since there are two haplotypes per individual.
CEU
YRI
>
Home
>
Instructions
>
People
>
Algorithms
>
Contact
Powered by
Free Site Templates