On Mar 7, 8:47 pm, Raymond Hettinger <pyt...@rcn.com> wrote: > The existing groupby() itertool works great when every element in a > group has the same key, but it is not so handy when groups are > determined by boundary conditions. > > For edge-triggered events, we need to convert a boundary-event > predicate to groupby-style key function. The code below encapsulates > that process in a new itertool called split_on(). > > Would love you guys to experiment with it for a bit and confirm that > you find it useful. Suggestions are welcome. > > Raymond > > ----------------------------------------- > > from itertools import groupby > > def split_on(iterable, event, start=True): > 'Split iterable on event boundaries (either start events or stop > events).' > # split_on('X1X23X456X', 'X'.__eq__, True) --> X1 X23 X456 X > # split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X > def transition_counter(x, start=start, cnt=[0]): > before = cnt[0] > if event(x): > cnt[0] += 1 > after = cnt[0] > return after if start else before > return (g for k, g in groupby(iterable, transition_counter)) > > if __name__ == '__main__': > for start in True, False: > for g in split_on('X1X23X456X', 'X'.__eq__, start): > print list(g) > print > > from pprint import pprint > boundary = '--===============2615450625767277916==\n' > email = open('email.txt') > for mime_section in split_on(email, boundary.__eq__): > pprint(list(mime_section, 1, None)) > print '= = ' * 30
For me your examples don't justify why you would need such a general algorithm. A split function that works on iterables instead of just strings seems straightforward, so maybe we should have that and another one function with examples of problems where a plain split does not work. Something like this should work for the two examples you gave were the boundaries are a known constants (and therefore there is really no need to keep them. I can always add them later): def split_on(iterable, boundary): l=[] for el in iterable: if el!=boundary: l.append(el) else: yield l l=[] yield l def join_on(iterable, boundary): it=iter(iterable) firstel=it.next() for el in it: yield boundary for x in el: yield x if __name__ == '__main__': lst=[] for g in split_on('X1X23X456X', 'X'): print list(g) lst.append(g) print print list(join_on(lst,'X')) -- http://mail.python.org/mailman/listinfo/python-list