On 02/29/2016 04:35 PM, Paul Eggert wrote: > I suggest using -a. LC_ALL=C won't work the way that you want on > platforms where the C locale is UTF-8, or is pure ASCII. For example, on > Fedora 23 or RHEL 7 with grep 2.23 we have: > > $ printf '\200\n' | LC_ALL=C grep . > Binary file (standard input) matches > > This is because the C locale is pure ASCII on these platforms, i.e., > '\200' is not a valid character the way it is with traditional Unix. I > don't know why Red Hat made that change.
I _think_ the Austin Group is leaning towards requiring the "C" locale to always be a unibyte locale with all 256 bytes as valid characters, so neither strict 7-bit ASCII nor UTF-8 would be usable as the "C" locale; but for that to happen, POSIX would also need to allow a way to get a UTF-8 locale easily accessible and describe how it differs from the "C" locale under such a ruling. But it's still all conjecture on what the final results will be - even in the standards committee, gracefully documenting how locale corner cases must behave vs. leaving implementations some latitude is tricky business; and any such change is at least 3 or 4 years down the road before it could be standardized in Issue 8 (right now, the focus is on Technical Corrigendum 2 for Issue 7). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature