On Fri, 28 Nov 2014 16:50:29 +0100 Vincent Lefevre <vinc...@vinc17.net> wrote: > What matters is whether a sequence corresponds to a valid UTF-8 > encoded Unicode character. My patch ensures that pcre_exec is called > on a string with only such characters, which implies that this is > also valid UTF-8 for PCRE (whether Unicode validity is also considered > in valid_utf8() or not). So, there's no valid reason why grep would > crash under such a condition.
It seems that PCRE treats e.g. following character as invalid. It means we should not these characters into pcre_exec with PCRE_NO_UTF8_CHECK option. 0xE0 0xC2 0xFF 0xED 0xA0 0xFF 0xF0 0xBF 0xFF 0xFF 0xF4 0xBF 0xBF 0xBF