$ locale charmap
GB18030
$ printf '\uC9\n' | grep  '.*7'  | hd
00000000  81 30 87 37 0a                                    |.0.7.|
00000005

U+00C9's encoding does end in the 0x37 byte (7 in ASCII and GB18030).

$ printf '\uC9\n' | grep  '.*0'

fails.

$ printf '\uC9\n' | grep  -o '.*7'

returns with a zero exit status but outputs nothing. It's as if
.*7 matched an empty string somewhere.

printf '\uC9\n' | grep  '\(.*7\)\1'

fails.

so do:

grep 7
grep '7$'
grep '.7'
grep '[^x]*7'
printf 'x\uC9\n' | grep -E '.+7'

These match:

grep '.\{0,1\}7'
grep -E '.?7'
printf '\uC9x\n' | grep  '.*7x' # still outputs nothing with -o

That's not confined to GB18030. You get similar issues with
BIG5-HKSCS, BIG5 or GBK.

$ locale charmap
BIG5-HKSCS
$ printf '\ue9\n' | grep  '.*m'  | hd
00000000  88 6d 0a                                          |.m.|
00000003

Reproduced with 2.25, 2.26 and the current git head on ubuntu 16.04 amd64.

-- 
Stephane



Reply via email to