On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano <st...@pearwood.info> wrote:
> I want to group [repeated] subsequences. For example, I have: > "ABCABCABCDEABCDEFABCABCABCB" > and I want to group it into repeating subsequences. I can see two > ways... How can I do this? Does this problem have a standard name and/or > solution? > I'm not aware of a standard name. This sounds like an unsupervised learning problem. There's no objectively correct answer unless you add more specificity to the problem statement. Regexes may sound tempting at first, but because a repeating subsequence may have nested repeating subsequences and this can go on infinitely, I think we at least need a push-down automata. I checked out some links for clustering algorithms that work on series subsequences and I found some fun results. Clustering is meaningless! http://www.cs.ucr.edu/~eamonn/meaningless.pdf I think you're in "no free lunch" territory. "Clustering of subsequence time series remains an open issue in time series clustering" http://www.hindawi.com/journals/tswj/2014/312521/ Any more detail on the problem to add constraints? -- https://mail.python.org/mailman/listinfo/python-list