bug#38503: Locale can cause incorrect number parsing in binary files

2019-12-05 Thread Paul Eggert
On 12/5/19 12:40 PM, Eric Blake wrote: I'm not sure of the current state of whether grep tries to use RRI on all systems or only on systems where it relies on gnulib's regcomp instead of libc. As I recall, grep doesn't make any special effort to use RRI. That is, if the underlying library use

bug#38503: Locale can cause incorrect number parsing in binary files

2019-12-05 Thread Eric Blake
On 12/5/19 12:55 PM, jan h wrote: compiling from scratch resulted in a normal, working version apparently Arch's package was somehow badly made? You also need to check whether your builds were using gnulib's regcomp replacement, or sticking with the one from glibc; and in turn which version o

bug#38503: Locale can cause incorrect number parsing in binary files

2019-12-05 Thread jan h
compiling from scratch resulted in a normal, working version apparently Arch's package was somehow badly made?

bug#38503: Locale can cause incorrect number parsing in binary files

2019-12-05 Thread jan h
On another machine with grep 3.1 this does not appear to be the case, so, regression? Kontakt jan h () kirjutas kuupƤeval N, 5. detsember 2019 kell 18:30: > > grep 3.3 > > I get a few weird symbols (seems valid utf-8), along with normal > numbers with the following simple snippet (.UTF-8 and .utf8

bug#38503: Locale can cause incorrect number parsing in binary files

2019-12-05 Thread Eric Blake
On 12/5/19 2:29 PM, Eric Blake wrote: tag 38503 notabug thanks On 12/5/19 12:30 PM, jan h wrote: grep 3.3 Note that the Rational Range Interpretation of ranges claims that [0-9] should have the expansion [012345689] in ALL locales; and more and more versions of GNU utilities are starting

bug#38503: Locale can cause incorrect number parsing in binary files

2019-12-05 Thread Eric Blake
tag 38503 notabug thanks On 12/5/19 12:30 PM, jan h wrote: grep 3.3 I get a few weird symbols (seems valid utf-8), along with normal numbers with the following simple snippet (.UTF-8 and .utf8 result in same, even .UtF---8 is the same): LC_ALL=en_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n

bug#38503: Locale can cause incorrect number parsing in binary files

2019-12-05 Thread jan h
grep 3.3 I get a few weird symbols (seems valid utf-8), along with normal numbers with the following simple snippet (.UTF-8 and .utf8 result in same, even .UtF---8 is the same): LC_ALL=en_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n" wc -c counts 1047 and and 1033 and 1036 etc,