Yes, this would depend heavily on how much optimization / transform the RE compiler does. After all, Leif's argument would seem to work against using a Trie, but that is in fact faster because of pattern matching re-organization.
On Tue, Jan 15, 2019 at 11:29 AM Walt Karas <wka...@oath.com.invalid> wrote: > I don't know the details of the algorithm for generating a state > machine to match a PCRE. I was wondering if there is an optimization > so that matching an alteration of subpatterns is faster than than > matching each subpattern sequentially. > > On Tue, Jan 15, 2019 at 11:21 AM Leif Hedstrom <zw...@apache.org> wrote: > > > > > > > > > On Jan 15, 2019, at 10:09 AM, Walt Karas <wka...@oath.com.INVALID> > wrote: > > > > > > When we remap we go through the list of regexs looking for a match. > > > What if we took all the regexes, removed subpattern captures, and > > > created one giant pattern with alternation and capture of the regex > > > for each rule, like: > > > > > > (regex1)|(regex2)|...|(regexN) > > > > > > We could then just step through the array of subpattern capture > > > offsets to find the matching rule. Then do a second match against > > > just the original rule regex to get any captures it contains. Would > > > this work, and would it be faster? > > > > > > That could potentially be a lot slower, no? Imagine you have a remap > today with > > > > regex_map <regex1> ... > > regex_map <regex2> ... > > regex_map <regex3> ... > > regex_map <regex4> … > > > > > > If you know what you are doing, and have some luck, you could order > these such as the most common regex is matched early. And first match > wins, and therefore, you could stop evaluating regexes early. Even in an > average case, you’d only evaluate half of the regexes, right? > > > > But if you combine all these into one regex, you essentially have to > evaluate all regexes all the time, regardless of order and regardless if > something has matched. > > > > — Leif > > > > > -- *Beware the fisherman who's casting out his line in to a dried up riverbed.* *Oh don't try to tell him 'cause he won't believe. Throw some bread to the ducks instead.* *It's easier that way. *- Genesis : Duke : VI 25-28