2016-04-21 5:07 GMT+02:00 Steven D'Aprano <st...@pearwood.info>: > I want to group repeated items in a sequence. For example, I can group > repeated sequences of a single item at a time using groupby: > > > from itertools import groupby > for key, group in groupby("AAAABBCDDEEEFFFF"): > group = list(group) > print(key, "count =", len(group)) > > > outputs: > > A count = 4 > B count = 2 > C count = 1 > D count = 2 > E count = 3 > F count = 4 > > > Now I want to group subsequences. For example, I have: > > "ABCABCABCDEABCDEFABCABCABCB" > > and I want to group it into repeating subsequences. I can see two ways to > group it: > > ABC ABC ABCDE ABCDE F ABC ABC ABC B > > giving counts: > > (ABC) count = 2 > (ABCDE) count = 2 > F count = 1 > (ABC) count = 3 > B repeats 1 time > > > or: > > ABC ABC ABC D E A B C D E F ABC ABC ABC B > > giving counts: > > (ABC) count = 3 > D count = 1 > E count = 1 > A count = 1 > B count = 1 > C count = 1 > D count = 1 > E count = 1 > F count = 1 > (ABC) count = 3 > B count = 1 > > > > How can I do this? Does this problem have a standard name and/or solution? > > > > > -- > Steven > > -- > https://mail.python.org/mailman/listinfo/python-list
Hi, if I am not missing something, the latter form of grouping might be achieved with the following regex: t="ABCABCABCDEABCDEFABCABCABCB" grouped = re.findall(r"((?:(\w+?)\2+)|\w+?)", t) print(grouped) for grp, subseq in grouped: if subseq: print(subseq, grp.count(subseq)) else: print(grp, "1") the printed output is: [('ABCABCABC', 'ABC'), ('D', ''), ('E', ''), ('A', ''), ('B', ''), ('C', ''), ('D', ''), ('E', ''), ('F', ''), ('ABCABCABC', 'ABC'), ('B', '')] ABC 3 D 1 E 1 A 1 B 1 C 1 D 1 E 1 F 1 ABC 3 B 1 The former one seems to be more tricky... hth, vbr -- https://mail.python.org/mailman/listinfo/python-list