bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

Norihiro Tanaka Fri, 28 Nov 2014 18:59:53 -0800

On Fri, 28 Nov 2014 16:50:29 +0100
Vincent Lefevre <[email protected]> wrote:
> What matters is whether a sequence corresponds to a valid UTF-8
> encoded Unicode character. My patch ensures that pcre_exec is called
> on a string with only such characters, which implies that this is
> also valid UTF-8 for PCRE (whether Unicode validity is also considered
> in valid_utf8() or not). So, there's no valid reason why grep would
> crash under such a condition.


It seems that PCRE treats e.g. following character as invalid.  It means
we should not   these characters into pcre_exec with PCRE_NO_UTF8_CHECK
option.

  0xE0 0xC2 0xFF
  0xED 0xA0 0xFF
  0xF0 0xBF 0xFF 0xFF
  0xF4 0xBF 0xBF 0xBF

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

Reply via email to