On Mon, 28 Mar 2005 09:18:38 -0800, Michael Spencer <[EMAIL PROTECTED]> wrote: > Bill Mill wrote: > > > for very long genomes he might want a generator: > > > > def xgen(s): > > l = len(s) - 1 > > e = enumerate(s) > > for i,c in e: > > if i < l and s[i+1] == '/': > > e.next() > > i2, c2 = e.next() > > yield [c, c2] > > else: > > yield [c] > > > > > >>>>for g in xgen('ATT/GATA/G'): print g > > > > ... > > ['A'] > > ['T'] > > ['T', 'G'] > > ['A'] > > ['T'] > > ['A', 'G'] > > > > Peace > > Bill Mill > > bill.mill at gmail.com > > works according to the original spec, but there are a couple of issues: > > 1. the output is specified to be a list, so delaying the creation of the list > isn't a win
True. However, if it is a really long genome, he's not going to want to have both a string of the genome and a list of the genome in memory. Instead, I thought it might be useful to iterate through the genome so that it doesn't have to be stored in memory. Since he didn't specify what he wants the list for, it's possible that he just needs to iterate through the genome, grouping degeneracies as he goes. > > 2. this version fails down in the presence of "double degeneracies" (if that's > what they should be called) - which were not in the OP spec, but which cropped > up in a later post : > >>> list(xgen("AGC/C/TGA/T")) > [['A'], ['G'], ['C', 'C'], ['/'], ['T'], ['G'], ['A', 'T']] This is simple enough to fix, in basically the same way your function works. I think it actually makes the function simpler: def xgen(s): e = enumerate(s) stack = [e.next()[1]] #push the first char into the stack for i,c in e: if c != '/': yield stack stack = [c] else: stack.append(e.next()[1]) yield stack >>> gn 'ATT/GATA/G/AT' >>> for g in xgen(gn): print g ... ['A'] ['T'] ['T', 'G'] ['A'] ['T'] ['A', 'G', 'A'] ['T'] Peace Bill Mill bill.mill at gmail.com -- http://mail.python.org/mailman/listinfo/python-list