Degenerate PCR, a short guide.
What is degenerate PCR? Where the Y=C or T, R=G or A, N=G, A, T or C. The more wobbles you introduce in the PCR primer the more degenerate it gets. Requirements. (What kind of sequence information do you need to get started). cDNA or genomic DNA? How degenerate can PCR primers be and still function? How to choose the PCR conditions. What types of genes is "easy" to find by degenerate PCR? Implications:
Degenerate PCR is in most respect identical to ordinary PCR, but with one major difference. Instead of using specific PCR primers with a given sequence, you use mixed PCR primers. That is, if you do not know exactly the sequence of the gene you are going to amplify, you insert "wobbles" in the PCR primers where there is more than one possibility. For instance if you just have a protein motif, you can back-translate the protein motif to the corresponding nucleotide motif. (Protein --> DNA Sequence).
Example of a degenerate PCR primer designed after a protein motif.
Trp Asp Thr Ala Gly Gln Glu Why use degenerate PCR?
5' TGG GAY ACN GCN GGN CAR GA 3' This gives a mix of 256 different oligonucleotides.
(The degeneracy of the primer is produced during DNA synthesis, you do not need to order 256 different primers to get a 256 mix, that's a lot of paper work!! and expensive).
Degenerate nucleotide codes: R=AG, Y=CT, M=AC, K=GT, W=AT, S=CG, B=CGT, D=AGT, H=ACT, V=ACG, N=ACGT.
Degenerate PCR has proven to be a very powerful tool to find "new" genes or gene families. Most genes comes in families which share structural similarities. By aligning the protein sequences from a number of related proteins you can find which parts of the protein is conserved or which is variable. Based on this information you can find conserved protein motifs which can be used as a starting point to make degenerate PCR primers.
Degenerate PCR can be used to "solve" a number of problems.
This is just a few examples of what kind of problems you can apply this technique for.
The protein motif does not have to be 100% conserved. Sometimes a partially conserved protein motif is sufficient. Examples of common found substitutions are Glu <--> Asp and Arg <-->Lys. If you use the degenerate "codon" GAN, it covers both Glu and Asp. Similar if you use the MGN codon, (M=C or A), where you know there should be a basic amino acid (Arg or Lys), the MGN codon covers partially the Lys codon AAR. If there is a Lys residue you will however have a G/T mismatch in the number 2 base. This is normally no problem as long as this mismatch occurs in the middle or the 5' part of the primer. (Remember your biochemistry, the enol form of thymidine can base pare with guanine).
If the the N-terminal sequence is 20-30 amino acids, it is often possible to make two degenerate primers, and you can PCR up a 50-90 bp cDNA fragment which you can use as probe to screen a cDNA library. Alternatively you can make two degenerate primers and try a 3' RACE, to amplify the rest of the cDNA.
Hint: The easiest way is normally to PCR up a fragment of the N-terminal, sequence this fragment and then make specific primers for 3' RACE.
1000 - 10.000 fold degeneracy is not uncommon.
The degeneracy of the primers can be kept "down" by substituting four base wobbles with inosines. (Example: GGI instead of GGN).
Ex: Motif: CVGG(M/L)NRRP (found in p53 proteins).
Without inosines. 131072 mix.
5' TGY GTN GGN GGN MTN AAY MGN MGN CC 3'
With inosines. 512 mix.
5' TGY GTI GGI GGI MTI AAY MGN MGN CC 3'
Many proteins have structural similarities with other proteins and often share a common evolutionary origin.
Proteins with ancient conserved motifs, (ACM's), are in general "easy" to find. More than 500 families of proteins with ACM's are known! (Some of these families are huge: Ser- Thr- Tyr- kinases in human numbers around 1000 genes). By this year, 2002, the complete sequence of 8 eukaryotic genomes are known (Human, Drosophila melanogaster (fruit fly), Anopheles gambiae (the mosquito), C. elegans (nematode), S. cerevisiae (yeast), Schizosaccharomyces pombe (yeast), Arabidopsis thaliana (plant), Plasmodium falciparum (protist) and pretty soon Giardia lamblia). In addition tens of bacterial genomes are completed. These genomes provide a wealth of information regarding the evolution of various gene families and can be used as a starting point to find genes in even more obscure organisms. Start by making a protein alignment of your protein of interest. Include as many proteins as you can find. If the protein is not well conserved, try to find regions that have some conserved amino acids, and if you know the sequence from a closely related organism, use this as a "guide sequence". Sometimes you can gamble on the sequence with great luck.
By using degenerate PCR you can find most genes from yeast and animals irrespective of organism (cow, frog, snail, beetle, worm or fungi). Problems may arise if you try to catch the fast evolving genes. If not, you are pretty sure to find what you are looking for by using degenerate PCR. The case may be a bit harder if you look for genes in protists, such as the cryptomonads, where many genes have undergone massive genetic drift, and have changed a lot compared to other eukaryots. Apart from that, limitations are in general relatively few.
What is degenerate PCR?
Where the Y=C or T, R=G or A, N=G, A, T or C.
The more wobbles you introduce in the PCR primer the more degenerate it gets.
Requirements. (What kind of sequence information do you need to get started).
cDNA or genomic DNA?
How degenerate can PCR primers be and still function?
How to choose the PCR conditions.
What types of genes is "easy" to find by degenerate PCR?