bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-26 Thread Paul Eggert
Zoltán Herczeg wrote: Just consider these two examples, where \x9c is an incorrectly encoded unicode codepoint: /(?<=\x9c)#/ Does it match \xd5\x9c# starting from #? No, because the input does not contain a \x9c encoding error. Encoding errors match only themselves, not parts of other cha

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-26 Thread Zoltán Herczeg
Hi, this is a very interesting discussion. >> /(?<=\x9c)#/ >> >> Does it match \xd5\x9c# starting from #? > >No, because the input does not contain a \x9c encoding error. Encoding errors >match only themselves, not parts of other characters. That is how the glibc >matchers behave, and it's wh

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-26 Thread Paul Eggert
On 09/26/2014 11:04 AM, Zoltán Herczeg wrote: this is a very interesting discussion. Yes, I have a lot of other things I'm *supposed* to be doing, but this thread is more fun /(?<=\x9c)#/ Does it match \xd5\x9c# starting from #? No, because the input does not contain a \x9c encoding e