Steven D'Aprano <st...@pearwood.info> writes: > I want to group repeated items in a sequence. For example, I can group > repeated sequences of a single item at a time using groupby: [...] > Now I want to group subsequences. For example, I have: > > "ABCABCABCDEABCDEFABCABCABCB" > > and I want to group it into repeating subsequences. I can see two ways to > group it: > > ABC ABC ABCDE ABCDE F ABC ABC ABC B [...] > or: > > ABC ABC ABC D E A B C D E F ABC ABC ABC B [...] > How can I do this? Does this problem have a standard name and/or solution?
Looks like a tough one. I don't think it has a name (I'm not even sure to be able to formally define it). Lets say it is an instance of "longest repeating substring" (https://en.wikipedia.org/wiki/Longest_repeated_substring_problem -- which really does not say much). Anyway, it looks like a job for a suffix trees. Depending on what you are after, you may also be interested in the sequitur algorithm (http://www.sequitur.info/). -- Alain. -- https://mail.python.org/mailman/listinfo/python-list