Group-8: Artika Nath, Piyush Ranjan, Angela Pena and Monica Rojas
?
Pipeline

?
A: Selecting the Exome Sequence from 1000 Genomes Project
The exome sequence selected, from the 1000 Genomes Project,?for exome analysis and variant calling was HG01112 corresponding to Colombian nationality. Following are the bam and bai files representing the exome:
HG01112.mapped.ILLUMINA.bwa.CLM.exome.20111114.bam
HG01112.mapped.ILLUMINA.bwa.CLM.exome.20111114.bam.bai
The reference genome used for this exome analysis project was 1000 Genomes project phase II reference genome hs37d5.fa.gz which is integrated reference sequence from the GRCh37 primary assembly (chromosomal plus unlocalized and unplaced contigs).
?
B: Variant Calling Using different Tools Calling variants?
(i)????? Generating BCF fies
We used SAMtools?mpileup to call the variants which were initially put into BCF files.
Command:?samtools mpileup -ugf hs37d5.fa HG01112.mapped.ILLUMINA.bwa.CLM.exome.20111114.bam | bcftools view -bvcg ? > HG01112.bcf
(ii)??? Generating VCF files
BCF file was converted to VCF file using bcftools in the SAMtools package
Command:?bcftools view HG01112.bcf?| vcfutils.pl varFilter -D 100 > HG01112.vcf
?
C: Annotating and Filtering the Variants using Different Tools
(i) VAT: Variant Annotation Tool (Habegger et al. 2012) which is designed as computational framework to functionally annotate variants in the exome using a cloud-computing environment. VAT uses GENCODE which is part of the ENCODE project to annotate the variants.
Filters were placed based on selecting only non synonymous and premature stops.
Command:?cat HG01112.vcf | snpMapper ../VAT/gencode7.interval ../VAT/gencode7.fa > HG01112_annotated.vcf
(ii) ANNOVAR:?Functional annotation of genetic variants from high-throughput sequencing data (Wang et al. 2010).?ANNOVAR tool annotates single nucleotide variants and indels, it also looks for finding variants on conserved regions and identifying variants which have been identified and reported in the 1000 Genomes Project and dbSNP.
Filtering using ANNOVAR was done as shown in the table below:

(iii) GATK for variant filtration:?After you run two or more variant annotation/analysis programs, each of which outputs a vcf file, you have to combine them into a single vcf file. This task was done using GATK combine variants utility:
http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_wal...
?
Top 20 Pathogenic Variants:

D: Analysis of Pathogenic variants
We analyzed four variants in-depth.

We looked for evidence for or against pathogenicity, including conservation, 3-D structure, experimental data (from GWAS or case-control studies, experiments with mouse models), clinical data (if available), population data (frequency of variant in different populations.? For this purpose, we retrieved information from several GWAS databases described below:
(i) dbGaP:?dbGaP is the database of Genotypes and Phenotypes. It was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies (GWAS), medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. It is available at: http://www.ncbi.nlm.nih.gov/gap
(ii) A Catalog of Published Genome-Wide Association Studies:?This is an online catalog of SNP-trait associations from published genome-wide association studies for use in investigating genomic characteristics of trait/disease-associated SNPs (TASs). It is available at http://www.genome.gov/gwastudies/
(iii) GWAS Central:?GWAS Central (previously the Human Genome Variation database of Genotype-to-Phenotype information) is a database of summary level findings from genetic association studies, both large and small. It is available at https://www.gwascentral.org/index
(iv) Gen2Phen (G2P):?G2P is a knowledge Centre that host the results obtained from the GEN2PHEN project. The GEN2PHEN project aims to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-To-Phenotype (G2P) data, and to link this system into other biomedical knowledge sources via genome browser functionality.?It is available at http://www.gen2phen.org/
?
E: Functional Characterization of Variants
?
1. CTNND2:
Function: Gene encodes an adhesive junction protein called delta-catenin which is implicated in brain and eye development.
Conservation: The CTNND2 gene is conserved in chimpanzee, Rhesus monkey, dog, cow, mouse, rat, chicken,??? zebrafish, fruit fly, and mosquito. The conserved regions are ICP4 whichhas two broad?transcriptional?regulatory domains and armadillo/beta-catenin-like repeats (an approximately 40 amino acid long tandemly repeated sequence)
Pathogenicity: CTNND2 is located on the short arm of chromosome 5 which is the critical regions for autism spectrum disorder (Harvard et al., 2005), mental/neurological disorders (Medina et al. 2000)and Cri du Chat syndrome.
Heterzygous deletion of short arm of chromosome 5 where CTNND2 is located has been found in Cri-du-Chat syndrome (Medina et al., 2000).
GWAS studies in Chinese populations showed that SNPs (rs6885224 and rs12716080) in the non coding region of CTNND2 which is located inside the linkage interval of MYP16 is strongly associated with high myopia (cause of visual impairment) (Li et al., 2011; Lu et al., 2011). However, minor allele C present at rs6885224 was shown to protect against myopia in the Lu et al study but was associated with risk for myopia in Li et al study.
In addition, a rare copy number variant as disrupts the CTNND2 a result of duplication which has been associated with schizophrenia. (Vrijenhoek et al., 2008)
Overexpression of CTNND2 has been seen in prostate tumors (Bertucci et al, 2006) and breast tumors (Lu et al., 209)
References
Medina?M,? Marinescu?RC,? Overhauser?J,? Kosik?KS. (2000) .?Hemizygosity of delta-catenin (CTNND2) is associated with severe mental retardation in Cri-du-Chat syndrome.?Genomics; ?63:157?164.
Bertucci?F,?Finetti?P,?Cervera?N,?et al. (2006).?Gene expression profiling shows medullary breast cancer is a subgroup of basal breast cancers.?Cancer Res; 66:4636?4644.
Lu?Q,? Zhang?J,?Allison?R,? et al. (2009) ?Identification of extracellular delta-catenin accumulation for prostate cancer detection.?Prostate; 69:411?418.
Vrijenhoek T, ?Buizer-Voskamp JE, ?Stelt I et al. (2008) Recurrent CNVs Disrupt Three Candidate Genes in Schizophrenia Patients. Am J Hum Genet; 83(4): 504?510.
Harvard C, Malenfant P, Koochek M, Creighton S, Mickelson EC, Holden JJ, Lewis ME et al. (2005)? A variant Cri du Chat phenotype and autism spectrum disorder in a subject with de novo cryptic microdeletions involving 5p15.2 and 3p24.3-25 detected using whole genomic array CGH.?Clin. Genet.?;67:341?351.
Medina M, Marinescu RC, Overhauser J, Kosik KS. et al. (2000) Hemizygosity of delta-catenin (CTNND2) is associated with severe mental retardation in cri-du-chat syndrome.?Genomics; 63:157?164
Lu B,?Jiang D,?Wang P,?Gao Y,?et al.2011. Replication study supports CTNND2 as a susceptibility gene for high myopia. Invest Ophthalmol Vis Sci.;52:8258-8261.
Li YJ, Goh L, Khor CC, et al. (2011) Genome-wide association studies reveal genetic variants in CTNND2 for high myopia in Singapore.?Chinese. Ophthalmology;118:368?375.
?
2. GOLGA8B:
Function: Gene encodes for a protein Golgin A8 family, member B that belong to the family of Golgins.?Golgins constitute a family of proteins that are localized to the Golgi apparatus and their?main function is the glycosylation and transport of proteins and lipids in the secretory?pathway.
Conservation: The GOLGA8B gene is conserved in Mouse, Dog, and Elephant.
Pathogenicity: GOLGA8B is located on the complementary strain in large arm of chromosome 15 and it contains 14 exons. Its size is 58.29kb (NC_000015.9) and encodes for a protein of 603 a.a in length (NP_001018861.3). There are more than 50 variations reported for this gene, most of them located in the Exon 14.? Evidence from a GWAS study suggests that variations in the gene GOLGA8B may be related with susceptibility to develop myopia and eye refractive errors in human populations (Solouki et al. 2010).
GWAS Report: Solouki and collaborators published in 2010 a paper entitled ?A genome-wide association study identifies a susceptibility locus for refractive errors and myopia at 15q14? In this study a cohort of 5,328 individual from a Dutch population were screened. Researcher found a significant association (p-value: 2.21?10-14) between eye refractive errors and a locus in the chromosome 15q14 (rs634990).? The odds ratio of myopia compared to hyperopia for the minor allele (minor allele frequency = 0.47) was 1.41 (95% CI 1.16-1.70) for individuals heterozygous for the allele and 1.83 (95% CI 1.42-2.36) for individuals homozygous for the allele. An interesting observation derived from this study is that this chromosome position is located near to genes that are expressed in the retina (GJD2 and ACTC1) and it appears to harbor regulatory elements that may be involved in the transcription of these genes.? The main conclusion in this study was that common variants at 15q14 might influence susceptibility for refractive errors in the general population. GOLGA8B is precisely located in this locus and it contains more than 50 SNPs described to date. However the variation that we are reporting is a new one in the exon 3 of the gene.
References
1. Solouki et al. A genome-wide association study identifies a susceptibility locus for refractive errors and myopia at 15q14. 2010. NATURE GENETICS, 42(10): 897-903.
2. G.A.Thorisson, O.Lancaster, R.C.Free, R.K.Hastings, P.Sarmah, D.Dash, S.K.Brahmachari, A.J.Brookes.?HGVbaseG2P: a central genetic association database.?2009. Nucleic Acids Research, 37:D797-802
?
3. PWWP2A:
Function: Gene encodes for a protein 2A containing a PWWP domain. According with the Conserved Domain Database (CDD) for the functional annotation of proteins the PWWP domain, named for a conserved Pro-Trp-Trp-Pro motif, is a small domain consisting of 100-150 amino acids, which is found in numerous proteins that are involved in cell division, growth and differentiation. Most PWWP-domain proteins seem to be nuclear, often DNA-binding, proteins that function as transcription factors regulating a variety of developmental processes.?For example, the PWWP domain is essential in DNA methyltransferase 3 B (Dnmt3b) that is responsible for establishing DNA methylation patterns during embryogenesis and gametogenesis. In tumorigenesis, DNA methylation by Dnmt3b is known to play a role in the inactivation of tumor suppressor genes. In addition, a point mutation in the PWWP domain of Dnmt3b has been identified in patients with ICF syndrome (immunodeficiency, centromeric instability, and facial anomalies), a rare autosomal recessive disorder characterized by hypomethylation of classical satellite DNA.
Conservation: The PWWP2A gene is conserved in chimpanzee, Rhesus monkey, dog, mouse, rat,?chicken, and zebrafish.
Pathogenicity: PWWP2A gene is located on the complementary strain of the ?q? arm in chromosome 5 and it contains 4 exons. It has been identified three transcript variants that produces three isoform of the protein: a, b and c. There is not experimental evidence for direct association between the gene and any diseases.
GWAS Report:? According with the Catalog of Published Genome-Wide Association Studies Available at: www.genome.gov/gwastudies two variations located in the intergenic region between PWWP2A and FABP6 has been (rs2546371 and rs4921110) associated with Waist-Hip ratio trait (the waist circumference measurement divided by the hip circumference measurement), both of them have the same significant p-value: 8.645?10-5. For both men and women, a waist-to-hip ratio (WHR) of 1.0 or higher is considered ?at risk? for undesirable health consequences, such as heart disease and ailments associated with OVERWEIGHT. Variation identified may be part of regulatory sequences that control the expression of PWWP2A and FABP6.

Reference
1. Hindorff LA, MacArthur J (European Bioinformatics Institute), Morales J (European Bioinformatics Institute), Junkins HA, Hall PN, Klemm AK, and Manolio TA. A Catalog of Published Genome-Wide Association Studies. Available at: www.genome.gov/gwastudies. Accessed [Nov 25, 2012].
4. AGXT:
Function: Gene encodes for an enzyme alanine-glyoxylate aminotransferase (AGXT), which is a hepatic enzyme that converts glyoxylate to glycine. This gene is located in the large arm of the chromosome 2 (2q37.3) and it is expressed only in the liver.? The encoded protein is functionally active in the peroxisomes, where it is involved in glyoxylate detoxification. Mutations in this gene, some of which alter subcellular targetting, have been associated with type I primary hyperoxaluria.
Conservation: The AGXT gene is conserved in chimpanzee, Rhesus monkey, dog, cow, mouse, rat,?chicken, zebrafish, fruit fly, mosquito, C.elegans, S.cerevisiae, K.lactis, M.oryzae,?N.crassa, A.thaliana, and rice.
Pathogenicity: Mutations in this gene, some of which alter subcellular targeting, have been associated with type I primary hyperoxaluria (PH1). Absence of AGT activity results in conversion of glyoxylate to oxalate, which is not capable of being degraded. Excess oxalate is excreted in the urine, causing kidney stones (urolithiasis), nephrocalcinosis, and kidney failure. As kidney function declines, blood levels of oxalate increase markedly, and oxalate combines with calcium to form calcium oxalate deposits in the kidney, eyes, heart, bones, and other organs, resulting in systemic disease. Pyridoxine (vitamin B6), a cofactor of AGT, is effective in reducing urine oxalate excretion in some PH1 patients.
GWAS Report:? According with the GWAS Central?available at: http://www.gwascentral.org/study/HGVST634 variations in the gene AGXT has been associated with human body height. A GWAS study performed by Lango et al. i2010 involving 183,727 individuals showed that that hundreds of genetic variants, in at least 180 loci influence adult height, a highly heritable and classic polygenic trait. Variations in the gene AGXT made part of the set of loci that were identified to explain adult height trait. ?Four variations were significantly observed in this study: rs12695032 (-log p= 3.47), rs5013752 (-log p= 3.19), rs4426527 (-log p= 3.25) and rs4344931 (-log p= 4.22).

References
1. Lango Allen et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. 2010. Nature 14; 467(7317): 832?838. doi:10.1038/nature09410.
2. Hindorffa L.A., Sethupathyb P., Junkinsa H.A., Ramosa E.M., Mehtac J.P., Collins F.S., and Teri A. Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. 2009. PNAS 106 (23): 9362?9367.
?
F. Structural Characterization of Variants
We structurally characterized two of the genes containing variants with high likelihood of being pathogenic. These genes were AGXT and GOLGA8B.
Protein Structure Prediction was done by threading using an online server side program called RaptorX (http://raptorx.uchicago.edu) . The modeled protein was visualized using VMD and the active-site prediction was carried out using DoGSiteScorer (http://dogsite.zbh.uni-hamburg.de)
?
1. AGXT

Figure 1. The predicted protein structure of AGXT (enzyme alanine-glyoxylate aminotransferase) showing the hypothesized catalytic site and the amino acid chain where the substitution occurs as a result of the variant Cytosine to Thymine at position 32 of the nucleotide sequence and Proline to Leucine at position 11 in the amino acid sequence.

Figure 2.?Comparative analysis of the catalytic domain of the modeled protein AGXT (B), with template (A) demonstrating known active site. The C to T variant in AGXT sequence changes proline (polar) to leucine (aliphatic amino acid) in the vicinity of the catalytic domain, which could affect the enzymatic function of the protein, which is mainly involved in the glyoxylate metabolic pathway in hepatic cells. The template (A) shows experimental ligands bound to a very huge active site (white arrows) suggesting catalytic domain may also function in protein-protein interaction.
2. GOLGA8B

Figure 3. The predicted threading protein model for GOLGA8B fragment (containing mutation), a Golgin A8 family member. GOLGA8B has the variant C to T at 178 nucleotide position leading to a change from arginine (polar) to threonine (uncharged) (white arrows). The mutation is included in the predicted active site (4th rank cluster) suggesting change in possible catalytic interactions.
Source: http://gtbinf.wordpress.com/2012/11/30/exome-analysis-pipeline-for-rare-variant-calling-prioritization-and-disease-association-in-exome-of-an-individual-from-colombian-descent/
willis mcgahee willis mcgahee ship aground off italy nfl playoff schedule 2012 nfl live saints vs 49ers vanessa marcil