https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
--- Comment #13 from Hans Åberg <haberg-1 at telia dot com> --- (In reply to Tim Shen from comment #11) > If you think it's time and space efficiently doable, maybe you can develop a > prototype first. I put action number lists on the NFA transitions (no markup of the states). But then the things I described before work. Theoretically it may be simplest to start with a NFA with empty (ε) transitions: For the sub-automatons one is interested in to find matches for, put a unique action number on the start and final state transitions, marked begin resp. end match. When compacted to a NFA without empty transitions, which is what I use, they become lists. I use bra-ket notation <k|A|k> to indicate the action number k begin-end match of the automaton A; parentheses indicate grouping. Then the original example can be written <1|z|1>(<2|<3|a*|3><4|b*|4><5|c|5>|2>)* My program then produces the following result, first the NFA, and then the matches: ex = (ε ↦ {<1|0}, { 0: {z ↦ {|1><2|<3||3><4||4><5|3, |1><2|<3||3><4|2, |1><2|<3|1, |1>∅}} 1: {a ↦ {1, |3><4|2, |3><4||4><5|3}} 2: {b ↦ {2, |4><5|3}} 3: {c ↦ {|5>|2><2|<3||3><4||4><5|3, |5>|2><2|<3||3><4|2, |5>|2><2|<3|1, |5>|2>∅}}}, {}) > zaacbbbcac Full string is a valid match! Matches: {zaacbbbcac, [@1@4@8@10]; <1|z|1> <1|z|1> <2|<3|aa|3><4||4><5|c|5>|2> <1|z|1> <2|<3|aa|3><4||4><5|c|5>|2> <2|<3||3><4|bbb|4><5|c|5>|2> <1|z|1> <2|<3|aa|3><4||4><5|c|5>|2> <2|<3||3><4|bbb|4><5|c|5>|2> <2|<3|a|3><4||4><5|c|5>|2> } In the first NFA description, transition actions are written after the action that one pass over. So, for example, the original start state ε ↦ {<1|0} means that one opens the match <1| to arrive at state 1; then all possibilities in state 1 closes it with |1>, then opening some other matches. In the matches description, after the matched string follows, the final state match string lengths follows, so that eventual empty matches get number 0. The longest match last then have at the end the sequence <2|<3|a|3><4||4><5|c|5>|2> which is what is asked for: #2 "ac, #3 "a", #4 "", #5 "c", in addition to # 1 "z" first.