Re: regular expressions was: Kernel Oops

Wietse Venema Tue, 08 Mar 2011 15:47:27 -0800

mouss:
[ Charset ISO-8859-1 unsupported, converting... ]
> Le 08/03/2011 23:49, Erik de Castro Lopo a ?crit :
> > Wietse Venema wrote:
> > 
> >> If you must match a very large numbers of patterns, you need an
> >> implementation that transforms N patterns into one deterministic
> >> automaton. This can match 1 pattern in the same time as N patterns.
> >> Once the automaton is built (which takes some time) it is blindingly
> >> fast. An example of such an implementation is flex.
> > 
> > Is there a limit the the pattern length in the pcre tables?
> > 
> > If not, it would be possible to convert this (3 only, but could be
> > hundreds or even thousands):
> > 
> >    /^([0-9]{1,3}\.){4}\.dsl\.dynamic\.eranet\.pl$/
> >    /^([0-9]{1,3}\.){4}\.dynamic\.snap\.net\.nz$/
> >    /^([0-9]{1,3}\.){4}\.nat\.umts\.dynamic\.eranet\.pl$/
> > 
> > to this:
> > 
> >    
> > /^([0-9]{1,3}\.){4}\.(dsl\.dynamic\.eranet\.pl|dynamic\.snap\.net\.nz|nat\.umts\.dynamic\.eranet\.pl)$/
> > 
> > and that should reject "1.1.1.1.not-found" in 1/3 the time of the
> > three original regexes while also matching quicker than the original.
> 
> 
> your speculations are wrong. /(joe|foo|bar)/ isn't /3 times faster than
> individual tests. but before all, "premature optimisation is the root of
> all evil". one should not convert readable stuff to unmaintainable
> hieroglyph without measuring the real benefits.


In the Postfix implementation, each regexp/pcre pattern is executed
separately, therefore (a|b|c) is faster than separate rules for a,
b and c. The savings are noticeable only in body_checks.

As for large numbers of CIDR patterns, I was referring to files
with 100,000 patterns. That is a non-trivial number, and I took
care to implement this such that postscreen could handle them.

I do agree with all the comments about skipping patterns with
IF/ENDIF or terminating matches early (which PCRE is very good at
if you use look-ahead and look-behind).

        Wietse

Re: regular expressions was: Kernel Oops

Reply via email to