On another machine with grep 3.1 this does not appear to be the case, so, regression?
Kontakt jan h (<jharal...@gmail.com>) kirjutas kuupƤeval N, 5. detsember 2019 kell 18:30: > > grep 3.3 > > I get a few weird symbols (seems valid utf-8), along with normal > numbers with the following simple snippet (.UTF-8 and .utf8 result in > same, even .UtF---8 is the same): > LC_ALL=en_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n" > wc -c counts 1047 and and 1033 and 1036 etc, so they're multi-byte characters > meanwhile, with LC_ALL being C.UTF-8 this is not the case, > LC_ALL=C.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"|wc -c > consistently results in 1024 characters/bytes, as it's supposed to be... > it's not just en_US, it seems ANY utf-8 locale, other than C results > in this bug, whereas non-utf8 versions are fine, bare en_US doesn't > show this bug, nor does en_US.iso88591... > > worthy of note is that [[:digit:]] works correctly, while [0-9] does > not (and 1-9 is same bug as 0-9, if you were wondering), setting -E > doesn't change anything either...