Trouble Shooting

This page contains information that we have assembled to help you solve problems that you may experience. We recommend reading this information even if you don't think that you are having problems, since it may help you to obtain better sequences.

What to do when the sequence isn't good
Why can't I open the files?
All I see is 5 "N"s...
The most common reasons for bad data
The "flat line" or "DOA - Dead on Analysis"
Truncated reads
Inaccurate data
Reading near the primer
Multiple sequences
Unusual base composition of templates

Top

What to do when the sequence isn't good

We provide you with a lot of information when you get your sequence. In addition to the sequence itself as a text file, you will get the electropherogram file and the sample reference spreadsheet. First check (using the sample sheet) that the samples you have are the right numbers (mistakes do happen!). Next use a chromatogram viewing application to view the actual data. If you can see that there is just 5 "N"s, this means that the base calling program did not find any data worth analysing. If you take a look at the raw data, you will see no good sequence peaks and if you take a look at the annotation page, you will see that the signal intensities for the individual bases are all very low (typically less than 50). If you get sequence, but it is not good, or it stops quickly, then this indicates a problem as well. The chromatogram file contains a lot of information that will help you to decide what the most appropriate course of action to take is and it is vital that you look at it. If you still have no idea what to do, firstly read the rest of this page (it may answer your questions!) then contact us to ask our advice.

Top

Why can't I open the Files?

We store the data on our server in data archives that compress the information. These arvhives have a ".zip" file extension and require a piece of software to expand up the archive so that the results files are available. Most computers should be able to expand up these archives using software already available. However, if you are having problems, there is free software is available to do this.

All I see is 5 "N"s...

The base caller we use will produce just 5 "N"s if it decides that there is nothing worth analysing. It is usually (but not always) correct in its decision. Since the most common cause of analysis failure is a lack of signal, if the raw data indicates a lack of sequence peaks and the annotation shows that the signal intensities for the bases is very low (typically less than 50) then this is the problem. Take a look below for more help. If there are visible products (especially very short products - but not just dye terminators), then please contact us and we will attempt to reanalyse the sample(s) to obtain data for you.

Top

The most common reasons for bad data

Over the years it has become obvious that certain common mistakes appear time and time again. Here they are in decending order of frequency (solution to problem is brackets):

  1. Too little DNA (Check on agarose gel and use correct amount)
  2. Too much DNA (Check on agarose gel and use correct amount)
  3. Primer with too low Tm (Ensure Tm is about 60 Deg. C; follow design guide)
  4. Multiple templates (Pick well spaced colonies off plate)
  5. Multiple priming sites (Check to make sure template has only one priming site)

Top

The "flat line" or "DOA - Dead on Analysis"

A total lack of any sequence can be very frustrating and does not give much information on which to base a solution to the problem. The most common reasons for this type of result are:

  1. One of the reaction components left out!
  2. Very low (<50ng) amount of template DNA (top of list above)
  3. Lack of priming site for primer used
  4. Totally unsuitable primer used (e.g. very low Tm, degenerate)
  5. Reaction products lost on ethanol precipitation

Some of these problems are ones that we have control over (eg addition of reaction components) and we can make mistakes. We are, afterall, only human! If you really think that the reaction should have worked, please contact us and we will be happy to repeat the sample using material already provided. If we have made a mistake (i.e. you get a good result with the repeat), then we will not charge for the repeat or the original. However, we do reserve the right to charge for both reactions should the repeat also fail (showing that the failure was not down to us). Hence, please double check everything before asking for repeats.

Top

Truncated reads

Due to the nature of DNA sequencing, the signal intensity always tends to decrease somewhat as the sequence extends. However, this should not be to the extent that the sequence becomes unreadable within a short distance of the primer. When a sequence starts off okay and then "dies" quickly (typically after a couple of hundred bases), this is often a result of low template quantity (see the list above again!). However, there are other causes. Salt or ethanol in the template DNA reduces the processivity of Taq DNA polymerase and results in the inability of the enzyme to produce long extension products. Salt in plasmid preps can be due to insufficient washing of ethanol pellets, insufficient washing of resin columns (mini-preps) or residual Caesium Chloride in the sample (maxi-preps). Ethanol in the DNA from precipitations that have not been dried properly can also result in the same effect.

Too much template also causes this effect (although the reaction has to be massively overloaded with template) since there is an excess of priming events leading to an abundance of short products. Massively too much DNA also leads to "retention" of the DNA on the capillary and therefore the sequence data only starts to come through when the run is almost finished. The intensity is also often too high for accurate sequence determination as well. This effect will be seen as very broad "ragged" peaks late in the read and nothing before it. The solution to this is to use the correct amount of DNA that will not cause capillary retention.

Top

Inaccurate data

Guess what the most common reason for this is? That's it, lack of template! If the signal is low, then the background starts to become a problem. Automated DNA sequencers will always try to find a sequence even if there actually isn't anything worth analysing. As stated above, if the base caller finds nothing at all to analyse, it will often just produce 5 "N"s. However, if it feels that there is something to analyse, it will try to do this. The basecaller we use will not call "N"s normally, since it gives a quality score to each base instead (depending on your chromatogram reader, you may or may not be able to access this information). Hence a sequence full of mistaken base calls is usually due to weak sequence. Even in good sequence, it is possible to find regions where the sequence is not so good and this can be due to template-specific effects (i.e. "odd" sequence) or due to contamination with unicorporated dye-terminators (see below - reading near the primer). Contamination with greater amounts of ethanol than those that cause short reads can yield generally poor sequence over the entire read. RNA in the sample will also do this since the primers and Taq will tend to bind to it and result in weak/absent sequence. The host used to generate plasmid templates can have a significant effect. Some hosts (such as HB101 and its deriviatives, including TG1, TG2 and the JM100 series) contain large amounts of carbohydrates that are released on lysis and can contaminate DNA prepared from them. These strains also contain an intact endA locus and so produce a nuclease that can degrade the plasmid DNA and result in a poor template.

Top

Reading near the primer

With any sequencing strategy there is a finite limit to how near to the primer it is possible to read. This is based on a number of factors to do with when chain termination begins and the resolving power of the system used to separate extension products. With automated DNA sequencers it is possible to read very near to the primer (up to a few bases away). However, with standard dye-terminator reactions reading closer than 20-30 bases is not usually possible due to the ineffeciency of recovery of very short extension products during the purification step. The ethanol precipitation step used during purification is a "differential precipitation" and relies on precipitating the extension products whilst leaving the dye terminators in solution. However, short extension products do tend not to be recovered and so lead to very weak signals for these products. Should you wish to read very close to the primer, please contact us to discuss approaches that can be used to attempt this. In general, using a primer that is 50 bases or so away from the area where you wish to start reading from is a safer and easier option.

Top

Multiple sequences

This is quite different from what is seen with weak sequence where the background signal becomes significant, leading to ambiguous base calls. When multiple sequences are seen two or more distinct peaks are present at each base location (unless, or course, both/all bases happen to be the same one), resulting in "peaks on top of peaks". There are several causes of this. The presence of more than one template in the sample is a common cause. What is often seen is that the sequence is good within the plasmid sequence, then becomes unreadable past the cloning site. The plasmid backbone sequence is the same in both/all templates while the inserts are different, with predictable results. The presence of more than one priming site in the plasmid (or more than one primer in the reaction!) will also cause this and the only way around this is to use a different primer.

Another reason for multiple sequences is poly A tails. Most cDNAs are generated by oligo dT priming. This results in a poly A/T stretch at the 3' end of the cDNA. The sequencing chemistry utilises Taq DNA polymerase. This polymerase is not very good at polymerising accurately through long stretches of homopolymeric sequence and it "slips"; by which I mean that it will either add extra nucleotides or remove nucleotides. Hence, not all the synthesised products will be of the same length. This results in multiple sequences after the poly A/T region. Depending on the length of the poly A/T section, the result can be either perfectly fine (generally less than 20 residues of A/T), mildly affected (20-40 A/Ts) or almost unreadable (over 40 A/Ts). In mild cases, it is often possible to correct the sequence by looking for "pre-peaks". This is a identical (but weaker) sequence running 1 nt before the correct sequence and is characterised by a small signal that is identical to the next major signal. e.g. If there is a G followed by a C, one will see a weak C peak under the G peak. If this causes an ambiguous base call (N), then it is fairly easy to correct this. Since this effect is a result of an inherent property of the polymerase, there is no way for us to correct it.

One effect that can be seen that has nothing to do with the template, but causes the same effect is due to poor primer synthesis. Primers are synthesised from the 3' end and so if things go wrong (e.g. a base is not incorporated) it is often at the 5' end (made last). Since the sequencing reaction products are extended from the 3' end, these errors are not removed (if a base was missing from the 3' end, the polymerase would simply fill it in and nobody would ever know!). If a proportion of the primer is n-1 (i.e. lacking the last base) then sequence products made with this primer will be of two lengths (n and n-1 plus whatever gets added by the polymerase). This will yield two sequences at every position. Depending on the amount of n-1 primer, this may or may not be a problem. Usually the amount of n-1 primer is very low (a good primer). However if the amount of n-1 primer is above about 10% of the total, then it is possible to see this in the sequence as a "pre-read" (i.e. there is a small amount of the next base at every position in the sequence). Whilst it is possible to HPLC purify the full length primer from the truncated one, an easier solution is to have the primer remade.

Top

Unusual base composition of templates

Some templates have an unusual abundance of certain bases. They may, for example, be very rich in GC or AT. Alternatively they may contain homopolymeric regions, where only a single base is present or repeat sequences of certain bases. All of these characteristics can present problems. AT-rich templates can make it very difficult to design primers with the desired characteristics (see primer design), whilst GC-rich templates are often very difficult to sequence satisfactorily because the Taq polymerase has great difficulty separating the strands of DNA once they anneal together. Any secondary structures within a single strand also tend to be much more difficult for Taq to polymerise through. Altered reaction conditions can help (e.g. increased denaturation time/temperature and the inclusion of compounds that destabilise base pairing such as DMSO at 5-10%) as can subcloning out smaller regions of the DNA. Homopolymeric regions of sequence are very difficult for Taq to read through accurately, since it tends to "slip" in such areas and miss out bases. This then results in a classical double sequence as described above. Long PolyA tails on cDNA clones are often a cause of this type of artefact.