2000)

Arnold G. Reinhold Thu, 16 Mar 2000 14:34:58 -0800
>Arnold G. Reinhold writes:
>
> > If you know the DNA sequences of alphabet letters, you can PCR probe
> > for common words or word fragments like "the" or "ing" and avoid
> > total sequencing.
>
>That's true. Luckily, there is no such test for random base sequences,
>though a pseudorandom sequence would certainly be very visible, but
>only if the genome has been totally sequenced (currently, an expensive
>and slow enterprise, despite Celera making large headways into
>it). Hence the need for steganography, which is further worsened by
>significant evolutionary conservation throughout the biological
>kingdom. The payload will be not be very high.

I am not sure I understand the difference between "random" and 
"pseudorandom" as you are using it here. In any case, I expect more 
sensitive cryptoanalytic tools for DNA can be developed if the need 
(and funding) arise.  For example,  has anyone done an n-tuple 
frequency analysis on natural DNA? Probes targeting n-tuples that are 
significantly less likely to occur in nature could be used to find 
human generated DNA strings without total sequencing.  It might even 
be possible to do something like autocorrelation by fragmenting the 
DNA, separating the strands, recombining and looking for 
complementary strands that bind inappropriately. (e.g. the first 
occurrence of "the" in strand A might bind to the second occurrence 
of "the" in strand A'.) You don't need the letter codes to do this.

>
> > A recent Genetic Engineering News says the price for synthetic DNA is
> > dropping from $1 per base to about $0.50 per base. That works out to
> > $0.25 per bit. That's about 8 orders of magnitude more expensive than
> > PC disk storage.
>
>This only applies to short sequences. If you have to (PCR-) ligate
>your sequences from shorter segments as output by the synthesizer
>robot, the price will skyrocket. Hey, nobody said it's going to be
>cheap, nor fast ;)

The problem seems to be error rates. Here is what one DNA synthesis 
company has to say:  http://www.alphadna.com/special.html#long 
oligonucleotides

>Longer than 35-mer oligonucleotides.  Polyacrylamide gel 
>electrophoresis (PAGE) purification, HPLC (high performance liquid 
>chromatography) purification
>
>       Let's assume that the efficiency of DNA synthesis is 99%. 
>With the addition of each consecutive base, the proportion of the 
>"aborted" oligonucleotides increases and at 40 bases the final 
>reaction will contain 67% "true" oligos and 23% shorter products. At 
>100 cycles only 36% of the products will be of the correct sequence. 
>Therefore, the synthesis of long oligos necessitates purification 
>by PAGE or HPLC,  the two reliable methods for purification of long 
>oligonucleotides.  For oligos longer than 50 bases, PAGE gives much 
>better results than HPLC.
>
>       We offer PAGE purification of oligonucleotides at the price 
>of additional $100 per oligo (35- to 70-mer) or $300 per oligo (70- 
>to 200-mer). In addition to this fee, we require an extra 24-48 
>hours to complete the PAGE purification.  Please note that PAGE 
>purification, although the best currently available method, does not 
>guarantee 100% error-free oligonucleotide products.  It was 
>reported by others that a PAGE-purified 123-mer and 126-mer, when 
>used for cloning, were proven to contain errors in about half   of 
>the clones (Hecker KH, Rill R. Error analysis of chemically 
>synthesized polynucleotides.  Biotechniques 1998 Feb;24:256-60).

A mer (as in polymer) is a DNA base pair.  There are four 
possibilities, so a mer encodes two bits. A 200-mer chain holds 400 
bits. That's long enough to start thinking about packet technology. 
You could use ECC to deal with the base errors, or just assume you 
will have enough copies of each packet to do majority voting.

Anyway, I expect Moore's law will apply here as it does in 
electronics. Price per base might be a good number to chart over 
time. I don't think that Moore's time constant is due to the peculiar 
nature of semiconductors, but rather it is results from the 
so-far-unlimited richness of the technology. DNA technology is just 
as rich in possibilities as semiconductors. I think Moore's 18 months 
is the limit as resources go to infinity of the time needed for 
humans to understand the limitations of the last innovation and come 
up with an approach to overcome them.

>
>The good part is extremely dense storage (i dot can contain far more
>than microfilm), and potential for destruction on demand: via
>packaging in a container bisected with a breakable membrane, one part
>containing the DNA (precipitated, or as solution) and the other a
>strongly fragmenting chemical (DNAses probably too slow, something
>strongly oxidizing like concentrated perchloric acid should do).

You are right, of course, about density, but I'd be reluctant to rely 
on DNA's destructibility. On the contrary, I am told that PCR can 
reliably detect ten molecules and has a good chance of detecting a 
single molecule. If you are synthesizing 160 nmole, that is 10**20 
molecules. You must completely destroy every one with high certainty.

Arnold Reinhold
Re: New York teen-ager win $100,000 with encryptionresearch(3/14/2000)

Reply via email to