I don't know the details of the algorithm for generating a state
machine to match a PCRE.  I was wondering if there is an optimization
so that matching an alteration of subpatterns is faster than than
matching each subpattern sequentially.

On Tue, Jan 15, 2019 at 11:21 AM Leif Hedstrom <zw...@apache.org> wrote:
>
>
>
> > On Jan 15, 2019, at 10:09 AM, Walt Karas <wka...@oath.com.INVALID> wrote:
> >
> > When we remap we go through the list of regexs looking for a match.
> > What if we took all the regexes, removed subpattern captures, and
> > created one giant pattern with alternation and capture of the regex
> > for each rule, like:
> >
> > (regex1)|(regex2)|...|(regexN)
> >
> > We could then just step through the array of subpattern capture
> > offsets to find the matching rule.  Then do a second match against
> > just the original rule regex to get any captures it contains.  Would
> > this work, and would it be faster?
>
>
> That could potentially be a lot slower, no? Imagine you have a remap today 
> with
>
>         regex_map <regex1> ...
>         regex_map <regex2> ...
>         regex_map <regex3> ...
>         regex_map <regex4> …
>
>
> If you know what you are doing, and have some luck, you could order these 
> such as the most common regex is matched early.  And first match wins, and 
> therefore, you could stop evaluating regexes early. Even in an average case, 
> you’d only evaluate half of the regexes, right?
>
> But if you combine all these into one regex, you essentially have to evaluate 
> all regexes all the time, regardless of order and regardless if something has 
> matched.
>
> — Leif
>
>

Reply via email to