Re: Detecting repeated subsequences of identical items

Chris Angelico Wed, 20 Apr 2016 22:41:58 -0700

On Thu, Apr 21, 2016 at 1:07 PM, Steven D'Aprano <[email protected]> wrote:
> Now I want to group subsequences. For example, I have:
>
> "ABCABCABCDEABCDEFABCABCABCB"
>
> and I want to group it into repeating subsequences. I can see two ways to
> group it:
>
> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>
> or:
>
> ABC ABC ABC D E A B C D E F ABC ABC ABC B


Interesting. I've *almost* managed to (ab)use re.split for this
purpose. A one-step solution can be done with re.match:

>>> txt = "ABCABCABCDEABCDEFABCABCABCB"
>>> re.match(r'(.+)\1+', txt)
<_sre.SRE_Match object; span=(0, 9), match='ABCABCABC'>

But split then returns only the grouped part:

>>> re.split(r'(.+)\1+', txt)
['', 'ABC', 'DEABCDEF', 'ABC', 'B']

or *all* the grouped parts:

>>> re.split(r'((.+)\2+)', txt)
['', 'ABCABCABC', 'ABC', 'DEABCDEF', 'ABCABCABC', 'ABC', 'B']

There's definitely a partial solution happening here, but I can't
quite make it work.

And no, I don't know if there's a standard name for it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Detecting repeated subsequences of identical items

Reply via email to