On 3/8/2011 6:00 PM, Erik de Castro Lopo wrote:
Noel Jones wrote:

The pattern length limit is controlled by the pcre library
you're using.  I think most implementations limit single
expressions to 64k characters.

Obviously something that needs testing.

Many years ago I worked on a system with a 32k limit on pcre expressions. Ever since then, everything I've checked has been 64k, and then I gave up checking. I expect any non-ancient system will support 64k, and some maybe even more. (To clarify for others following along, this is a characters per single expression limit, not a filesize or number of expressions per file limit)

Consider the input string '123-234-32-12.whatever' and now compare
matching against three rules:

      /^([0-9]{1,3}\.){4}foo$/
      /^([0-9]{1,3}\.){4}bar$/
      /^([0-9]{1,3}\.){4}baz$/

In this ase, there will be three attempts (one on each pattern)
that fail on the fourth character ('-') of the input pattern. That
means that to fail all three patterns, there will be 12 character
comparisions.

Now compare that against:

      /^([0-9]{1,3}\.){4}(foo|bar|baz)$/

which will again fail on the fourth character, but there is only one
pattern which matches the same strings as the 3 patterns above.

This example is pretty easy to see that combining is better. It's not so clear if you create 32k of complex gibberish if it will actually operate faster as there may be significant startup times. YMMV and all that.

BTW, with pcre you should use the the non-greedy flag inside parenthesis if you're not doing $n substitutions. This saves another smidgen of time and memory.
/^(?:[0-9]{1,3}\.){4}(?:foo|bar|baz)$/


 -- Noel Jones

Reply via email to