>Others have probably solved your basic problem, or pointed >the way. I'm just curious.
>Given that the information content is 2 bits per character >that is taking up 8 bits of storage, there must be a good reason >for storing and/or transmitting them this way? I.e., it it easy >to think up a count-prefixed compressed format packing 4:1 in >subsequent data bytes (except for the last byte which have >less than 4 2-bit codes). My guess for the inefficiency in storage size is because it is human-readable, and because most in-silico molecular biology is just a bunch of fancy string algorithms. This is my limited view of these things at least. >I'm wondering how the data is actually used once records are >retrieved. This one I can answer. For my purposes, I'm just organizing the sequences at hand, but there are all sorts of things one could actually do with sequences: alignments, BLAST searches, gene annotations, etc. -- http://mail.python.org/mailman/listinfo/python-list