On Fri, 28 Nov 2014 16:50:29 +0100
Vincent Lefevre <vinc...@vinc17.net> wrote:
> What matters is whether a sequence corresponds to a valid UTF-8
> encoded Unicode character. My patch ensures that pcre_exec is called
> on a string with only such characters, which implies that this is
> also valid UTF-8 for PCRE (whether Unicode validity is also considered
> in valid_utf8() or not). So, there's no valid reason why grep would
> crash under such a condition.

It seems that PCRE treats e.g. following character as invalid.  It means
we should not   these characters into pcre_exec with PCRE_NO_UTF8_CHECK
option.

  0xE0 0xC2 0xFF
  0xED 0xA0 0xFF
  0xF0 0xBF 0xFF 0xFF
  0xF4 0xBF 0xBF 0xBF





Reply via email to