On 3/8/2011 4:49 PM, Erik de Castro Lopo wrote:
Wietse Venema wrote:

If you must match a very large numbers of patterns, you need an
implementation that transforms N patterns into one deterministic
automaton. This can match 1 pattern in the same time as N patterns.
Once the automaton is built (which takes some time) it is blindingly
fast. An example of such an implementation is flex.

Is there a limit the the pattern length in the pcre tables?

The pattern length limit is controlled by the pcre library you're using. I think most implementations limit single expressions to 64k characters.

It's unclear to me if a single huge complex expression will evaluate faster that multiple less complex expressions.


If not, it would be possible to convert this (3 only, but could be
hundreds or even thousands):

    /^([0-9]{1,3}\.){4}\.dsl\.dynamic\.eranet\.pl$/
    /^([0-9]{1,3}\.){4}\.dynamic\.snap\.net\.nz$/
    /^([0-9]{1,3}\.){4}\.nat\.umts\.dynamic\.eranet\.pl$/

to this:

    
/^([0-9]{1,3}\.){4}\.(dsl\.dynamic\.eranet\.pl|dynamic\.snap\.net\.nz|nat\.umts\.dynamic\.eranet\.pl)$/

and that should reject "1.1.1.1.not-found" in 1/3 the time of the
three original regexes while also matching quicker than the original.

(your sample expression looks a little wonky to me. You sure it works?)

Improving performance would be better accomplished by enclosing the similar lines in an IF..ENDIF statement. Performance should be improved for non-matching input, readability and maintainability is dramatically improved.

Skipping rules always beats evaluating rules.
Unreadable rules should be avoided.


  -- Noel Jones

Reply via email to