https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472

--- Comment #13 from Hans Åberg <haberg-1 at telia dot com> ---
(In reply to Tim Shen from comment #11)
> If you think it's time and space efficiently doable, maybe you can develop a
> prototype first.

I put action number lists on the NFA transitions (no markup of the states). But
then the things I described before work.

Theoretically it may be simplest to start with a NFA with empty (ε)
transitions: For the sub-automatons one is interested in to find matches for,
put a unique action number on the start and final state transitions, marked
begin resp. end match. When compacted to a NFA without empty transitions, which
is what I use, they become lists.

I use bra-ket notation <k|A|k> to indicate the action number k begin-end match
of the automaton A; parentheses indicate grouping. Then the original example
can be written
  <1|z|1>(<2|<3|a*|3><4|b*|4><5|c|5>|2>)*

My program then produces the following result, first the NFA, and then the
matches:

ex = (ε ↦ {<1|0}, {
  0: {z ↦ {|1><2|<3||3><4||4><5|3, |1><2|<3||3><4|2, |1><2|<3|1, |1>∅}}
  1: {a ↦ {1, |3><4|2, |3><4||4><5|3}}
  2: {b ↦ {2, |4><5|3}}
  3: {c ↦ {|5>|2><2|<3||3><4||4><5|3, |5>|2><2|<3||3><4|2, |5>|2><2|<3|1,
|5>|2>∅}}}, {})

> zaacbbbcac
Full string is a valid match!
Matches: {zaacbbbcac, [@1@4@8@10];
<1|z|1>
<1|z|1> <2|<3|aa|3><4||4><5|c|5>|2>
<1|z|1> <2|<3|aa|3><4||4><5|c|5>|2> <2|<3||3><4|bbb|4><5|c|5>|2>
<1|z|1> <2|<3|aa|3><4||4><5|c|5>|2> <2|<3||3><4|bbb|4><5|c|5>|2>
<2|<3|a|3><4||4><5|c|5>|2>
}

In the first NFA description, transition actions are written after the action
that one pass over. So, for example, the original start state ε ↦ {<1|0} means
that one opens the match <1| to arrive at state 1; then all possibilities in
state 1 closes it with |1>, then opening some other matches.

In the matches description, after the matched string follows, the final state
match string lengths follows, so that eventual empty matches get number 0. The
longest match last then have at the end the sequence
  <2|<3|a|3><4||4><5|c|5>|2>
which is what is asked for: #2 "ac, #3 "a", #4 "", #5 "c", in addition to # 1
"z" first.

Reply via email to