On Sun, Nov 20, 2016 at 2:59 PM, Stephane Chazelas <stephane.chaze...@gmail.com> wrote: > 2016-11-20 21:50:28 +0000, Stephane Chazelas: >> $ locale charmap >> GB18030 >> $ printf '\uC9\n' | grep '.*7' | hd >> 00000000 81 30 87 37 0a |.0.7.| >> 00000005 >> >> U+00C9's encoding does end in the 0x37 byte (7 in ASCII and GB18030). > [...] >> Reproduced with 2.25, 2.26 and the current git head on ubuntu 16.04 amd64. > [...] > > Same behaviour with 2.26 on Solaris 11.
Thank you for the report. I can reproduce that error on Fedora 25 with this: $ (export LC_ALL=zh_CN.gb18030; printf '\uC9\n' > k; grep '.*7' k)|wc -c 5 I confirmed that the problem does not arise (i.e., no match, with exit status of 1) when we force the use of glibc's regex matcher by inserting a trivial back-reference: $ (export LC_ALL=zh_CN.gb18030; printf '\uC9\n' > k; grep -E '()\1.*7' k); echo $? 1 This bisected to v2.18-54-g3ef4c8e, but that commit was just the messenger: it exposed the latent bug by making it so this case was no longer handled by glibc's regexp matcher, but rather by grep's dfa.c.