Single-nucleotide polymorphism

From Wikipedia, the free encyclopedia
Jump to: navigation, search
The upper DNA molecule differs from the lower DNA molecule at a single base-pair location (a C/A polymorphism).

A single nucleotide polymorphism, often abbreviated to SNP (pronounced snip; plural snips), is a variation in a single nucleotide that occurs at a specific position in the genome, where each variation is present to some appreciable degree within a population (e.g. >1%).[1]

For example, at a specific base position in the human genome, the base C may appear in most individuals, but in a minority of individuals, the position is occupied by base A. There is a SNP at this specific base position, and the two possible nucleotide variations - C or A - are said to be alleles for this base position. Although in this example and most SNPs so far discovered there are only two different alleles, there are also triallelic SNPs in which three different base variations may coexist within a population.[2]

SNPs underlie differences in our susceptibility to disease; a wide range of human diseases, e.g. sickle-cell anemia, β-thalassemia and cystic fibrosis result from SNPs.[3][4][5] The severity of illness and the way our body responds to treatments are also manifestations of genetic variations. For example, a single base mutation in the APOE (apolipoprotein E) gene is associated with a higher risk for Alzheimer's disease.[6]

Types[edit]

Types of SNPs
Types of Single Nucleotide Polymorphism (SNPs)

Single-nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions (regions between genes). SNPs within a coding sequence do not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code.

SNPs in the coding region are of two types, synonymous and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence while nonsynonymous SNPs change the amino acid sequence of protein. The nonsynonymous SNPs are of two types: missense and nonsense.

SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding, messenger RNA degradation, or the sequence of non-coding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP) and may be upstream or downstream from the gene.

Applications of SNPs[edit]

  • Association studies can determine whether a genetic variant is associated with a disease or trait.[7]
  • A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium (the non-random association of alleles at two or more loci). Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.
  • Haplotype mapping: sets of alleles or DNA sequences can be clustered so that a single SNP can identify many linked SNPs.
  • Linkage Disequilibrium (LD), a term used in population genetics, indicates non-random association of alleles at two or more loci, not necessarily on the same chromosome. It refers to the phenomenon that SNP allele or DNA sequence which are close together in the genome tend to be inherited together. LD is affected by two parameters: 1) The distance between the SNPs [the larger the distance the lower the LD]. 2) Recombination rate [the lower the recombination rate the higher the LD].[8]

Frequency[edit]

Within a genome[edit]

The genomic distribution of SNPs is not homogenous; SNPs occur in non-coding regions more frequently than in coding regions or, in general, where natural selection is acting and 'fixing' the allele (eliminating other variants) of the SNP that constitutes the most favorable genetic adaptation.[9] Other factors, like genetic recombination and mutation rate, can also determine SNP density.[10]

SNP density can be predicted by the presence of microsatellites: AT microsatellites in particular are potent predictors of SNP density, with long (AT)(n) repeat tracts tending to be found in regions of significantly reduced SNP density and low GC content.[11]

Within a population[edit]

There are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another. Within a population, SNPs can be assigned a minor allele frequency — the lowest allele frequency at a locus that is observed in a particular population. This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms.

Importance[edit]

Variations in the DNA sequences of humans can affect how humans develop diseases and respond to pathogens, chemicals, drugs, vaccines, and other agents. SNPs are also critical for personalized medicine.[12]

Biomedical research[edit]

SNPs' greatest importance in biomedical research is for comparing regions of the genome between cohorts (such as with matched cohorts with and without a disease) in genome-wide association studies. SNPs have been used in genome-wide association studies as high-resolution markers in gene mapping related to diseases or normal traits. SNPs without an observable impact on the phenotype (so called silent mutations) are still useful as genetic markers in genome-wide association studies, because of their quantity and the stable inheritance over generations.[13]

Forensics[edit]

SNP's were used initially for matching a forensic DNA sample to a suspect but it has been phased out with development of STR based DNA fingerprinting technique.[14] In future SNP's may be used in forensics for some phenotypic clues like eye color, hair color, ethnicity etc. Kidd etal has demonstrated that a panel of 19 SNP can identify the ethenic group with good probability of match (Pm = 10^-7) in 40 population groups studied.[15]

In a situation of low amount of forensic sample or degraded sample, the SNP can be a good alternative to STR because of abundance of potential markers, amenability to automation, and potential reduction in required fragment length to only 60-80 bp.[16] In absence of a STR match in DNA profile database; different SNPs can be used to get clues regarding ethnicity, phenotype, linease and even identity.

Pharmacogenetics[edit]

Some SNPs are associated with the metabolism of different drugs.[17][18][19] The association of a wide range of human diseases like cancer, infectious diseases (AIDS, leprosy, hepatitis, etc.) autoimmune, neuropsychiatric and many other diseases with different SNPs can be made as relevant pharmacogenomic targets for drug therapy.[20]

Disease[edit]

A single SNP may cause a Mendelian disease, though for complex diseases, SNPs do not usually function individually, rather, they work in coordination with other SNPs to manifest a disease condition as has been seen in Osteoporosis.[21]

All types of SNPs can have an observable phenotype or can result in disease:

Examples[edit]

Databases[edit]

As there are for genes, bioinformatics databases exist for SNPs.

  • dbSNP is a SNP database from the National Center for Biotechnology Information (NCBI). As of 8 June 2015, dbSNP listed 149,735,377 SNPs in humans.[32][33]
  • Kaviar[34] is a compendium of SNPs from multiple data sources including dbSNP.
  • SNPedia is a wiki-style database supporting personal genome annotation, interpretation and analysis.
  • The OMIM database describes the association between polymorphisms and diseases (e.g., gives diseases in text form)
  • The Human Gene Mutation Database provides gene mutations causing or associated with human inherited diseases and functional SNPs
  • The International HapMap Project, where researchers are identifying Tag SNP to be able to determine the collection of haplotypes present in each subject.
  • GWAS Central allows users to visually interrogate the actual summary-level association data in one or more genome-wide association studies.

The International SNP Map working group mapped the sequence flanking each SNP by alignment to the genomic sequence of large-insert clones in Genebank. These alignments were converted to chromosomal coordinates that is shown in Table 1.[35]

Chromosome Length(bp) All SNPs TSC SNPs
Total SNPs kb per SNP Total SNPs kb per SNP
1 214,066,000 129,931 1.65 75,166 2.85
2 222,889,000 103,664 2.15 76,985 2.90
3 186,938,000 93,140 2.01 63,669 2.94
4 169,035,000 84,426 2.00 65,719 2.57
5 170,954,000 117,882 1.45 63,545 2.69
6 165,022,000 96,317 1.71 53,797 3.07
7 149,414,000 71,752 2.08 42,327 3.53
8 125,148,000 57,834 2.16 42,653 2.93
9 107,440,000 62,013 1.73 43,020 2.50
10 127,894,000 61,298 2.09 42,466 3.01
11 129,193,000 84,663 1.53 47,621 2.71
12 125,198,000 59,245 2.11 38,136 3.28
13 93,711,000 53,093 1.77 35,745 2.62
14 89,344,000 44,112 2.03 29,746 3.00
15 73,467,000 37,814 1.94 26,524 2.77
16 74,037,000 38,735 1.91 23,328 3.17
17 73,367,000 34,621 2.12 19,396 3.78
18 73,078,000 45,135 1.62 27,028 2.70
19 56,044,000 25,676 2.18 11,185 5.01
20 63,317,000 29,478 2.15 17,051 3.71
21 33,824,000 20,916 1.62 9,103 3.72
22 33,786,000 28,410 1.19 11,056 3.06
X 131,245,000 34,842 3.77 20,400 6.43
Y 21,753,000 4,193 5.19 1,784 12.19
RefSeq 15,696,674 14,534 1.08
Totals 2,710,164,000 1,419,190 1.91 887,450 3.05

Nomenclature[edit]

The nomenclature for SNPs can be confusing: several variations can exist for an individual SNP and consensus has not yet been achieved. One approach is to write SNPs with a prefix, period and "greater than" sign showing the wild-type and altered nucleotide or amino acid; for example, c.76A>T.[36][37][38] SNPs are frequently referred to by their dbSNP rs number, as in the examples above.

SNP analysis[edit]

SNPs are usually biallelic and thus easily assayed.[39] Analytical methods to discover novel SNPs and detect known SNPs include:

Programs for prediction of SNP effects[edit]

An important group of SNPs are those that corresponds to missense mutations causing amino acid change on protein level. Point mutation of particular residue can have different effect on protein function (from no effect to complete disruption its function). Usually, change in amino acids with similar size and physico-chemical properties (e.g. substitution from leucine to valine) has mild effect, and opposite. Similarly, if SNP disrupts secondary structure elements (e.g. substitution to proline in alpha helix region) such mutation usually may affect whole protein structure and function. Using those simple and many other machine learning derived rules a group of programs for the prediction of SNP effect was developed:

See also[edit]

Notes[edit]

  1. ^ "single nucleotide polymorphism / SNP | Learn Science at Scitable". www.nature.com. Retrieved 2015-11-13. 
  2. ^ Hodgkinson, Alan; Eyre-Walker, Adam (November 2, 2009). "Human Triallelic Sites: Evidence for a New Mutational Mechanism?". Genetics 1. doi:10.4172/2157-7145.1000107. 
  3. ^ Ingram, V. M. (1956). "A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin". Nature 178 (4537): 792–794. doi:10.1038/178792a0. PMID 13369537. 
  4. ^ Chang, J. C.; Kan, Y. W. (1979). "Beta 0 thalassemia, a nonsense mutation in man". Proceedings of the National Academy of Sciences of the United States of America 76 (6): 2886–2889. doi:10.1073/pnas.76.6.2886. PMC 383714. PMID 88735. 
  5. ^ Hamosh, A.; King, T. M.; Rosenstein, B. J.; Corey, M.; Levison, H.; Durie, P.; Tsui, L. C.; McIntosh, I.; Keston, M.; Brock, D. J.; Macek, M.; Zemková, D.; Krásničanová, H.; Vávrová, V.; Macek, M.; Golder, N.; Schwarz, M. J.; Super, M.; Watson, E. K.; Williams, C.; Bush, A.; O'Mahoney, S. M.; Humphries, P.; Dearce, M. A.; Reis, A.; Bürger, J.; Stuhrmann, M.; Schmidtke, J.; Wulbrand, U.; Dörk, T. (1992). "Cystic fibrosis patients bearing both the common missense mutation Gly----Asp at codon 551 and the delta F508 mutation are clinically indistinguishable from delta F508 homozygotes, except for decreased risk of meconium ileus". American Journal of Human Genetics 51 (2): 245–250. PMC 1682672. PMID 1379413. 
  6. ^ Wolf, A. B.; Caselli, R. J.; Reiman, E. M.; Valla, J. (2012). "APOE and neuroenergetics: An emerging paradigm in Alzheimer's disease". Neurobiology of Aging 34 (4): 1007–17. doi:10.1016/j.neurobiolaging.2012.10.011. PMID 23159550. 
  7. ^ http://genome.cshlp.org/content/14/5/908.full
  8. ^ https://www.researchgate.net/publication/229085137_Single_nucleotide_polymorphisms_a_new_paradigm_for_molecular_marker_technology_and_DNA_polymorphism_detection_with_emphasis_on_their_use_in_plantsPK_Gupta_JK_Roy_M_PrasadCurr_Sci_80_%284%29_524-35
  9. ^ Barreiro LB; Laval G; Quach H; Patin E; Quintana-Murci L. (2008). "Natural selection has driven population differentiation in modern humans". Nature Genetics 40 (3): 340–345. doi:10.1038/ng.78. PMID 18246066. 
  10. ^ Nachman, Michael W. (2001). "Single nucleotide polymorphisms and recombination rate in humans". Trends in Genetics 17 (9): 481–485. doi:10.1016/S0168-9525(01)02409-X. PMID 11525814. 
  11. ^ M.A. Varela & W. Amos (2010). "Heterogeneous distribution of SNPs in the human genome: Microsatellites as predictors of nucleotide diversity and divergence". Genomics 95 (3): 151–159. doi:10.1016/j.ygeno.2009.12.003. PMID 20026267. 
  12. ^ Carlson, Bruce (2008-06-15). "SNPs — A Shortcut to Personalized Medicine". Genetic Engineering & Biotechnology News (Mary Ann Liebert, Inc.) 28 (12). Retrieved 2008-07-06. (subtitle) Medical applications are where the market's growth is expected 
  13. ^ Thomas, P. E.; Klinger, R.; Furlong, L. I.; Hofmann-Apitius, M.; Friedrich, C. M. (2011). "Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers". BMC Bioinformatics 12: S4. doi:10.1186/1471-2105-12-S4-S4. PMC 3194196. PMID 21992066. 
  14. ^ Butler, John M. (2010). Fundamentals of forensic DNA typing. Burlington, MA: Elsevier/Academic Press. ISBN 9780080961767. 
  15. ^ Kidd, KK; Pakstis, AJ; Speed, WC; Grigorenko, EL; Kajuna, SL; Karoma, NJ; Kungulilo, S; Kim, JJ; Lu, RB; Odunsi, A; Okonofua, F; Parnas, J; Schulz, LO; Zhukova, OV; Kidd, JR (1 December 2006). "Developing a SNP panel for forensic identification of individuals.". Forensic Science International 164 (1): 20–32. doi:10.1016/j.forsciint.2005.11.017. PMID 16360294. 
  16. ^ Budowle, B; van Daal, A (April 2008). "Forensically relevant SNP classes.". BioTechniques 44 (5): 603–8, 610. doi:10.2144/000112806. PMID 18474034. 
  17. ^ Goldstein, J. A. (2001). "Clinical relevance of genetic polymorphisms in the human CYP2C subfamily". British Journal of Clinical Pharmacology 52 (4): 349–355. doi:10.1046/j.0306-5251.2001.01499.x. PMC 2014584. PMID 11678778. 
  18. ^ Lee, C. R. (2004). "CYP2C9 genotype as a predictor of drug disposition in humans". Methods and findings in experimental and clinical pharmacology 26 (6): 463–472. PMID 15349140. 
  19. ^ Yanase, K.; Tsukahara, S.; Mitsuhashi, J.; Sugimoto, Y. (2006). "Functional SNPs of the breast cancer resistance protein ‐ therapeutic effects and inhibitor development". Cancer Letters 234 (1): 73–80. doi:10.1016/j.canlet.2005.04.039. PMID 16303243. 
  20. ^ Fareed, M.; Afzal, M. (2013). "Single nucleotide polymorphism in genome-wide association of human population: A tool for broad spectrum service". Egyptian Journal of Medical Human Genetics 14 (2): 123–134. doi:10.1016/j.ejmhg.2012.08.001. 
  21. ^ Singh, Monica; Singh, Puneetpal; Juneja, Pawan Kumar; Singh, Surinder; Kaur, Taranpal (2010). "SNP–SNP interactions within APOE gene influence plasma lipids in postmenopausal osteoporosis". Rheumatology International 31 (3): 421–3. doi:10.1007/s00296-010-1449-7. PMID 20340021. 
  22. ^ Li, G.; Pan, T.; Guo, D.; Li, L. C. (2014). "Regulatory Variants and Disease: The E-Cadherin -160C/A SNP as an Example". Mol Biol Int 2014: 967565. doi:10.1155/2014/967565. PMC 4167656. PMID 25276428. 
  23. ^ Lu, Yi-Fan; Mauger, David M.; Goldstein, David B.; Urban, Thomas J.; Weeks, Kevin M.; Bradrick, Shelton S. (4 November 2015). "IFNL3 mRNA structure is remodeled by a functional non-coding polymorphism associated with hepatitis C virus clearance". Scientific Reports 5: 16037. doi:10.1038/srep16037. PMID 26531896. 
  24. ^ Kimchi-Sarfaty, C.; Oh, JM.; Kim, IW.; Sauna, ZE.; Calcagno, AM.; Ambudkar, SV.; Gottesman, MM. (Jan 2007). "A "silent" polymorphism in the MDR1 gene changes substrate specificity". Science 315 (5811): 525–8. doi:10.1126/science.1135308. PMID 17185560. 
  25. ^ Al-Haggar M; Madej-Pilarczyk A; Kozlowski L; Bujnicki JM; Yahia S; Abdel-Hadi D; Shams A; Ahmad N; Hamed S; Puzianowska-Kuznicka M (2012). "A novel homozygous p.Arg527Leu LMNA mutation in two unrelated Egyptian families causes overlapping mandibuloacral dysplasia and progeria syndrome". Eur J Hum Genet. 20 (11): 1134–40. doi:10.1038/ejhg.2012.77. PMC 3476705. PMID 22549407. 
  26. ^ Cordovado, SK.; Hendrix, M.; Greene, CN.; Mochal, S.; Earley, MC.; Farrell, PM.; Kharrazi, M.; Hannon, WH.; Mueller, PW. (Feb 2012). "CFTR mutation analysis and haplotype associations in CF patients". Mol Genet Metab 105 (2): 249–54. doi:10.1016/j.ymgme.2011.10.013. PMC 3551260. PMID 22137130. 
  27. ^ Giegling I; Hartmann AM; Möller HJ; Rujescu D (November 2006). "Anger- and aggression-related traits are associated with polymorphisms in the 5-HT-2A gene". Journal of Affective Disorders 96 (1–2): 75–81. doi:10.1016/j.jad.2006.05.016. PMID 16814396. 
  28. ^ Kujovich, J. L. (Jan 2011). "Factor V Leiden thrombophilia". Genet Med 13 (1): 1–16. doi:10.1097/GIM.0b013e3181faa0f2. PMID 21116184. 
  29. ^ Morita, Akihiko; Nakayama, Tomohiro; Doba, Nobutaka; Hinohara, Shigeaki; Mizutani, Tomohiko; Soma, Masayoshi (2007). "Genotyping of triallelic SNPs using TaqMan PCR". Molecular and Cellular Probes 21 (3): 171–6. doi:10.1016/j.mcp.2006.10.005. PMID 17161935. 
  30. ^ Prodi, D.A.; Drayna, D; Forabosco, P; Palmas, MA; Maestrale, GB; Piras, D; Pirastu, M; Angius, A (2004). "Bitter Taste Study in a Sardinian Genetic Isolate Supports the Association of Phenylthiocarbamide Sensitivity to the TAS2R38 Bitter Receptor Gene". Chemical Senses 29 (8): 697–702. doi:10.1093/chemse/bjh074. PMID 15466815. 
  31. ^ Ammitzbøll, Christian Gytz; Kjær, Troels Rønn; Steffensen, Rudi; Stengaard-Pedersen, Kristian; Nielsen, Hans Jørgen; Thiel, Steffen; Bøgsted, Martin; Jensenius, Jens Christian (28 November 2012). "Non-Synonymous Polymorphisms in the FCN1 Gene Determine Ligand-Binding Ability and Serum Levels of M-Ficolin". PLoS ONE 7 (11): e50585. doi:10.1371/journal.pone.0050585. PMC 3509001. PMID 23209787. 
  32. ^ National Center for Biotechnology Information, United States National Library of Medicine. 2014. NCBI dbSNP build 142 for human. http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2014q4/000147.html
  33. ^ National Center for Biotechnology Information, United States National Library of Medicine. 2015. NCBI dbSNP build 144 for human. Summary Page. http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summary&build_id=144
  34. ^ Glusman, G; Caballero, J; Mauldin, D. E.; Hood, L; Roach, J. C. (2011). "Kaviar: An accessible system for testing SNV novelty". Bioinformatics 27 (22): 3216–7. doi:10.1093/bioinformatics/btr540. PMC 3208392. PMID 21965822. 
  35. ^ Sachidanandam, R.; Weissman, D.; Schmidt, S. C.; Kakol, J. M.; Stein, L. D.; Marth, G.; Sherry, S.; Mullikin, J. C.; Mortimore, B. J.; Willey, D. L.; Hunt, S. E.; Cole, C. G.; Coggill, P. C.; Rice, C. M.; Ning, Z.; Rogers, J.; Bentley, D. R.; Kwok, P. Y.; Mardis, E. R.; Yeh, R. T.; Schultz, B.; Cook, L.; Davenport, R.; Dante, M.; Fulton, L.; Hillier, L.; Waterston, R. H.; McPherson, J. D.; Gilman, B.; Schaffner, S. (2001). "A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms". Nature 409 (6822): 928–933. doi:10.1038/35057149. PMID 11237013. 
  36. ^ J.T. Den Dunnen (2008-02-20). "Recommendations for the description of sequence variants". Human Genome Variation Society. Retrieved 2008-09-05. 
  37. ^ den Dunnen, Johan T.; Antonarakis, Stylianos E. (2000). "Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion". Human Mutation 15 (1): 7–12. doi:10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N. PMID 10612815. 
  38. ^ Ogino, Shuji; Gulley, Margaret L.; Den Dunnen, Johan T.; Wilson, Robert B.; Association for Molecular Pathology Training and Education Committee (2007). "Standard Mutation Nomenclature in Molecular DiagnosticsPractical and Educational Challenges". The Journal of Molecular Diagnostics 9 (1): 1–6. doi:10.2353/jmoldx.2007.060081. PMC 1867422. PMID 17251329. 
  39. ^ Sachidanandam, Ravi; Weissman, David; Schmidt, Steven C.; Kakol, Jerzy M.; Stein, Lincoln D.; Marth, Gabor; Sherry, Steve; Mullikin, James C.; et al. (2001). "A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms". Nature 409 (6822): 928–33. doi:10.1038/35057149. PMID 11237013. 
  40. ^ Altshuler, D; Pollara, V J; Cowles, C R; Van Etten, W J; Baldwin, J; Linton, L; Lander, E S (2000). "An SNP map of the human genome generated by reduced representation shotgun sequencing". Nature 407 (6803): 513–6. doi:10.1038/35035083. PMID 11029002. 
  41. ^ Drabovich, A.P.; Krylov, S.N. (2006). "Identification of base pairs in single-nucleotide polymorphisms by MutS protein-mediated capillary electrophoresis". Analytical Chemistry 78 (6): 2035–8. doi:10.1021/ac0520386. PMID 16536443. 
  42. ^ Griffin, T J; Smith, L M (2000). "Genetic identification by mass spectrometric analysis of single-nucleotide polymorphisms: ternary encoding of genotypes". Analytical Chemistry 72 (14): 3298–302. doi:10.1021/ac991390e. PMID 10939403. 
  43. ^ Tahira, T.; Kukita, Y.; Higasa, K.; Okazaki, Y.; Yoshinaga, A.; Hayashi, K. (2009). "Estimation of SNP allele frequencies by SSCP analysis of pooled DNA". Methods Mol Biol. Methods in Molecular Biology 578: 193–207. doi:10.1007/978-1-60327-411-1_12. ISBN 978-1-60327-410-4. PMID 19768595. 

References[edit]

External links[edit]