On Jan 31, Hong Zhang said:

>> But as you say, case folding is expensive. And with this approach you
>> are going to case-fold every string that is matched against an rx
>> that has some part of it that is case-insensitive.
>
>That is correct in general. But regex compiler can be smarter than that.
>For example, rx should optimize /a+/i to /[aA]+/ to avoid case-folding.
>If it is too difficult for rx to do case-folding, I think it is better
>to use some normalizer to do full-case folding.

Agh, if you go and do that, you must then be sure that rx is capable of
optimizing /a/i and /[aA]/ in the same way.  What I mean is that Perl's
current regex engine is able to use /abc/i as a "constant" in a string,
while it cannot do the same for /[Aa][Bb][Cc]/.  Why?  Because in the
first case, the string being matched against has been folded, so "abc"
will or will not be in the string.  In the second case, the string has not
been folded, so scanning for that "constant" string would require either

  a) temporary folding to look for "abc"
  b1) looking for "a" or "A" and then...
  b2) looking for "b" or "B" and then...
  b3) looking for "c" or "C"

That sounds like more effort that it's worth.  Perl's current engine
handles /abc/i much faster than /[Aa][Bb][Cc]/.

-- 
japhy, the perl hacker with one red shoe






Reply via email to