mouss: [ Charset ISO-8859-1 unsupported, converting... ] > Le 08/03/2011 23:49, Erik de Castro Lopo a ?crit : > > Wietse Venema wrote: > > > >> If you must match a very large numbers of patterns, you need an > >> implementation that transforms N patterns into one deterministic > >> automaton. This can match 1 pattern in the same time as N patterns. > >> Once the automaton is built (which takes some time) it is blindingly > >> fast. An example of such an implementation is flex. > > > > Is there a limit the the pattern length in the pcre tables? > > > > If not, it would be possible to convert this (3 only, but could be > > hundreds or even thousands): > > > > /^([0-9]{1,3}\.){4}\.dsl\.dynamic\.eranet\.pl$/ > > /^([0-9]{1,3}\.){4}\.dynamic\.snap\.net\.nz$/ > > /^([0-9]{1,3}\.){4}\.nat\.umts\.dynamic\.eranet\.pl$/ > > > > to this: > > > > > > /^([0-9]{1,3}\.){4}\.(dsl\.dynamic\.eranet\.pl|dynamic\.snap\.net\.nz|nat\.umts\.dynamic\.eranet\.pl)$/ > > > > and that should reject "1.1.1.1.not-found" in 1/3 the time of the > > three original regexes while also matching quicker than the original. > > > your speculations are wrong. /(joe|foo|bar)/ isn't /3 times faster than > individual tests. but before all, "premature optimisation is the root of > all evil". one should not convert readable stuff to unmaintainable > hieroglyph without measuring the real benefits.
In the Postfix implementation, each regexp/pcre pattern is executed separately, therefore (a|b|c) is faster than separate rules for a, b and c. The savings are noticeable only in body_checks. As for large numbers of CIDR patterns, I was referring to files with 100,000 patterns. That is a non-trivial number, and I took care to implement this such that postscreen could handle them. I do agree with all the comments about skipping patterns with IF/ENDIF or terminating matches early (which PCRE is very good at if you use look-ahead and look-behind). Wietse