>Arnold G. Reinhold writes:
>
> > If you know the DNA sequences of alphabet letters, you can PCR probe
> > for common words or word fragments like "the" or "ing" and avoid
> > total sequencing.
>
>That's true. Luckily, there is no such test for random base sequences,
>though a pseudorandom sequence would certainly be very visible, but
>only if the genome has been totally sequenced (currently, an expensive
>and slow enterprise, despite Celera making large headways into
>it). Hence the need for steganography, which is further worsened by
>significant evolutionary conservation throughout the biological
>kingdom. The payload will be not be very high.
I am not sure I understand the difference between "random" and
"pseudorandom" as you are using it here. In any case, I expect more
sensitive cryptoanalytic tools for DNA can be developed if the need
(and funding) arise. For example, has anyone done an n-tuple
frequency analysis on natural DNA? Probes targeting n-tuples that are
significantly less likely to occur in nature could be used to find
human generated DNA strings without total sequencing. It might even
be possible to do something like autocorrelation by fragmenting the
DNA, separating the strands, recombining and looking for
complementary strands that bind inappropriately. (e.g. the first
occurrence of "the" in strand A might bind to the second occurrence
of "the" in strand A'.) You don't need the letter codes to do this.
>
> > A recent Genetic Engineering News says the price for synthetic DNA is
> > dropping from $1 per base to about $0.50 per base. That works out to
> > $0.25 per bit. That's about 8 orders of magnitude more expensive than
> > PC disk storage.
>
>This only applies to short sequences. If you have to (PCR-) ligate
>your sequences from shorter segments as output by the synthesizer
>robot, the price will skyrocket. Hey, nobody said it's going to be
>cheap, nor fast ;)
The problem seems to be error rates. Here is what one DNA synthesis
company has to say: http://www.alphadna.com/special.html#long
oligonucleotides
>Longer than 35-mer oligonucleotides. Polyacrylamide gel
>electrophoresis (PAGE) purification, HPLC (high performance liquid
>chromatography) purification
>
> Let's assume that the efficiency of DNA synthesis is 99%.
>With the addition of each consecutive base, the proportion of the
>"aborted" oligonucleotides increases and at 40 bases the final
>reaction will contain 67% "true" oligos and 23% shorter products. At
>100 cycles only 36% of the products will be of the correct sequence.
>Therefore, the synthesis of long oligos necessitates purification
>by PAGE or HPLC, the two reliable methods for purification of long
>oligonucleotides. For oligos longer than 50 bases, PAGE gives much
>better results than HPLC.
>
> We offer PAGE purification of oligonucleotides at the price
>of additional $100 per oligo (35- to 70-mer) or $300 per oligo (70-
>to 200-mer). In addition to this fee, we require an extra 24-48
>hours to complete the PAGE purification. Please note that PAGE
>purification, although the best currently available method, does not
>guarantee 100% error-free oligonucleotide products. It was
>reported by others that a PAGE-purified 123-mer and 126-mer, when
>used for cloning, were proven to contain errors in about half of
>the clones (Hecker KH, Rill R. Error analysis of chemically
>synthesized polynucleotides. Biotechniques 1998 Feb;24:256-60).
A mer (as in polymer) is a DNA base pair. There are four
possibilities, so a mer encodes two bits. A 200-mer chain holds 400
bits. That's long enough to start thinking about packet technology.
You could use ECC to deal with the base errors, or just assume you
will have enough copies of each packet to do majority voting.
Anyway, I expect Moore's law will apply here as it does in
electronics. Price per base might be a good number to chart over
time. I don't think that Moore's time constant is due to the peculiar
nature of semiconductors, but rather it is results from the
so-far-unlimited richness of the technology. DNA technology is just
as rich in possibilities as semiconductors. I think Moore's 18 months
is the limit as resources go to infinity of the time needed for
humans to understand the limitations of the last innovation and come
up with an approach to overcome them.
>
>The good part is extremely dense storage (i dot can contain far more
>than microfilm), and potential for destruction on demand: via
>packaging in a container bisected with a breakable membrane, one part
>containing the DNA (precipitated, or as solution) and the other a
>strongly fragmenting chemical (DNAses probably too slow, something
>strongly oxidizing like concentrated perchloric acid should do).
You are right, of course, about density, but I'd be reluctant to rely
on DNA's destructibility. On the contrary, I am told that PCR can
reliably detect ten molecules and has a good chance of detecting a
single molecule. If you are synthesizing 160 nmole, that is 10**20
molecules. You must completely destroy every one with high certainty.
Arnold Reinhold