On Thu, Apr 21, 2016 at 1:07 PM, Steven D'Aprano <st...@pearwood.info> wrote: > Now I want to group subsequences. For example, I have: > > "ABCABCABCDEABCDEFABCABCABCB" > > and I want to group it into repeating subsequences. I can see two ways to > group it: > > ABC ABC ABCDE ABCDE F ABC ABC ABC B > > or: > > ABC ABC ABC D E A B C D E F ABC ABC ABC B
Interesting. I've *almost* managed to (ab)use re.split for this purpose. A one-step solution can be done with re.match: >>> txt = "ABCABCABCDEABCDEFABCABCABCB" >>> re.match(r'(.+)\1+', txt) <_sre.SRE_Match object; span=(0, 9), match='ABCABCABC'> But split then returns only the grouped part: >>> re.split(r'(.+)\1+', txt) ['', 'ABC', 'DEABCDEF', 'ABC', 'B'] or *all* the grouped parts: >>> re.split(r'((.+)\2+)', txt) ['', 'ABCABCABC', 'ABC', 'DEABCDEF', 'ABCABCABC', 'ABC', 'B'] There's definitely a partial solution happening here, but I can't quite make it work. And no, I don't know if there's a standard name for it. ChrisA -- https://mail.python.org/mailman/listinfo/python-list