Hello,
I have strings represented as a combination of an alphabet (AGCT) and a an operator "/", that signifies degeneracy. I want to split these strings into lists of lists, where the degeneracies are members of the same list and non-degenerates are members of single item lists. An example will clarify this:
"ATT/GATA/G"
gets split to
[['A'], ['T'], ['T', 'G'], ['A'], ['T'], ['A', 'G']]
I have written a very ugly function to do this (listed below for the curious), but intuitively I think this should only take a couple of lines for one skilled in regex and/or listcomp. Any takers?
James
p.s. Here is the ugly function I wrote:
def build_consensus(astr):
consensus = [] # the lol that will be returned possibilities = [] # one element of consensus consecutives = 0 # keeps track of how many in a row
for achar in astr: if (achar == "/"): consecutives = 0 continue else: consecutives += 1 if (consecutives > 1): consensus.append(possibilities) possibilities = [achar] else: possibilities.append(achar) if possibilities: consensus.append(possibilities) return consensus
Hi,
in the spirit of "Now I have two problems" I like to avoid r.e. when I can. I don't think mine avoids a bit of ugly, but I, at least, find it easier to grok (YMMV):
def build_consensus(string):
result = [[string[0]]] # starts list with a list of first char accumulate = False
for char in string[1:]:
if char == '/': accumulate = True
else: if accumulate: # The pop removes the last list appended, and we use # its single item to build then new list to append. result.append([result.pop()[0], char]) accumulate = False
else: result.append([char])
return result
(Since list.append returns None, this could use
accumulate = result.append([result.pop()[0], char])
in place of the two lines in the if accumulate block, but I don't think that is a gain worth paying for.)
HTH,
Brian vdB
-- http://mail.python.org/mailman/listinfo/python-list