[Bug libstdc++/85472] Regex match bug

haberg-1 at telia dot com Tue, 01 May 2018 12:44:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472


--- Comment #17 from Hans Åberg <haberg-1 at telia dot com> ---
(In reply to Tim Shen from comment #16)
> (In reply to Hans Åberg from comment #15)
> > (In reply to Tim Shen from comment #14)
> > > How fast does your prototype run on the following example?
> > >   ((10)|(11)|(12)|...|(98)|(99))* matching "10111213141516...9899"
> > > how about this:
> > >   ((100)|(101)|(102)|...|(998)|(999))* matching "100101102...998999"
> > > 
> > > where "..." are the skipped strings that follow the surrounding pattern.
> > 
> > My program does not choke on limit cases, like the regex '(a?)^k a^k' on the
> > string string "a"^k, mentioned at [1]; for k = 100, it takes 0.393 s on the
> > NFA and 0.168 on the DFA on a not so powerful computer. Otherwise, it is not
> > optimized and fast in absolute terms: I use standard C++ containers such as
> > unordered_set for DFA state lookups, and not a number.
> > 
> > 1. https://swtch.com/~rsc/regexp/regexp1.html
> 
> Sorry, I meant to observe whether the example triggers a O(k^2) behavior. It
> should be less than quadratic. To measure this, you don't have to compare
> the time of your prototype against libstdc++.

I'll have to get back on this. But it shouldn't choke:

During the lexing, I just record the DFA states as function of the string
positions. When a final state appears, the reverse NFA data can be used to
compute the NFA states this match actually uses, and therefore, as the
characters are known, also the NFA transitions, and any data that depends on
it.

[Bug libstdc++/85472] Regex match bug

Reply via email to