Greg Wooledge (12020-04-30): > For the first part, you want LC_NUMERIC=C.
Damn right I want LC_NUMERIC=C. And I want LC_COLLATE=C. And I want LC_NUMERIC=C, and LC_TIME=C. And I also want LC_WHATEVER_THE_HECK_THEY_WILL_INVENT_NEXT=C too. I want LC_EVERYTHING=C except LC_CTYPE. And everybody who works with command-line tools should want the same, because these things were ugly mistakes with way more drawbacks than benefits. > For the second part, what you're asking for is sometimes called > "rational ranges", or "rational range interpretation". This is the > notion that, for specific range expressions like '[a-z]' within a > regular expression or glob, the software will assume you want to > match only '[[:lower:]]', rather than doing what you actually said. > > The idea behind this is based on the (probably accurate) belief that > most people who write [a-z] or [A-Z] in their scripts wanted the > LC_COLLATE=C (or 1980s) meaning of the range, not the meaning of the > range in modern times. Thus, it's a sort of safety net strung below > the novice programmer, to catch them when they fall. > > Since this is not how systems currently behave, however, what you need > to do in your script is write the expression correctly. For your > example, I believe that would be [[:xdigit:]]. Or if you really do > mean to restrict it to lower-case 'a' through 'f', retain what you > have, but set LC_COLLATE=C first. > > I haven't seen rational range interpretation discussion that covers > '[a-f]', but I haven't been following it closely. I am very aware of the pros of cons of localized collation. The thing that was incredibly stupid was to change the semantic of something as fundamental as regular expressions. The correct way to introduce localized alphabetical order in regular expression would have been to introduce a new notation for it, not to hijack something that was already used. Since this ugly mistake has been made, the only sane course of action is to let locales to their C setting. Which is exactly what I do, and I am very fine with it. And I can laugh by myself every time somebody comes complaining that the output of a command is ugly because it did not consider localization would break alignment, or that a script is broken because the matching of a regexp has been altered. Regards, -- Nicolas George
signature.asc
Description: PGP signature