On 02/29/2016 01:11 PM, Marcello Perathoner wrote: >> Yes, locale dependencies on standard behavior can be annoying. >> > > You assume that a user will only ever want to grep text files encoded in > the machine's locale. That is not so.
You've been relying on undefined behavior, and it caught up with you. It's the same as asking for us to keep use-after-free "working" in a multithreaded program because it has always "worked" in your older single-threaded program when nothing was perturbing the memory between free() and its latent use. A latent bug in your usage is still a bug in your usage, even if it took a change in grep's defaults to expose your problem. And meanwhile, newer grep 2.23 has improved the heuristics to only complain about a binary file if it would otherwise be outputting encoding errors (rather than blindly complaining about the encoding error up front and stopping processing immediately), which does alleviate some of the worst of the change caused by your undefined usage (that is, you can still grep for valid encodings, and get reasonable results so long as the valid text doesn't mix with lines with invalid encodings). > > As a German user I have on my disk files in many encodings: utf-8, > iso-8859-1, win-1252, iso-8859-15, encodings that are now defunct like > CP850, CP847, "German 7-bit ASCII" that replaced braces with Umlauts, > old WordStar files that used control characters inside. > > Since 2.21 I will now have to always specify -a or LC_ALL=C when > grepping my files. Yes, but then you are no longer relying on undefined behavior, and therefore have a leg to stand on if we break that behavior. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature