On Fri, Jan 6, 2023 at 7:49 PM Carlo Arenas <care...@gmail.com> wrote: > Reported to PCRE[1] with mention of GNU grep being also affected. > > [1] https://github.com/PCRE2Project/pcre2/issues/185
Yikes. This is a big deal. Thank you for the patch and added test. I made a tiny comment tweak and this test logic change that was required to make the new test pass with the fixed version. -grep -Po 'r\w' in > out && fail=1 +grep -Po 'r\w' in > out || fail=1 Also, make syntax-check required to change e.g., -compare out exp || fail=1 +compare exp out || fail=1 Every bug fix needs a NEWS entry, so I added this: With -P, some non-ASCII UTF8 characters were not recognized as word-constituent due to our omission of the PCRE_UCP flag. E.g., given f(){ echo Perú|LC_ALL=en_US.UTF-8 grep -Po "$1"; } and this command, echo $(f 'r\w'):$(f '.\b'), before it would print ":r". After the fix, it prints the correct results: "rú:ú". Finally, I expanded the ChangeLog entry and gave credit where due. I'll push this tomorrow:
grep-pcre-fix.diff
Description: Binary data