Il 28/06/2013 14:49, Eli Zaretskii ha scritto: > > > When being consistent means being buggy, I don't want the consistency. > > > I want the bug solved in all the programs I use, but if it takes time > > > to do that, I will be glad in the meantime to use some programs that > > > don't have that bug, i.e. are "inconsistent". > > > > I will be less glad to move a regex or piece of code from one to > > another, and find inconsistency. > > You should report a bug in that case.
In the case of sed, I'll gladly to direct the reporter to the "Non-bugs" section of the manual. Which also explains why you should anyway use LC_ALL=C: `[a-z]' is case insensitive `s/.*//' does not clear pattern space You are encountering problems with locales. POSIX mandates that `[a-z]' uses the current locale's collation order -- in C parlance, that means strcoll(3) instead of strcmp(3). Some locales have a case insensitive strcoll, others don't. Another problem is that [a-z] tries to use collation symbols. This only happens if you are on the GNU system, using GNU libc's regular expression matcher instead of compiling the one supplied with GNU sed. In a Danish locale, for example, the regular expression `^[a-z]$' matches the string `aa', because `aa' is a single collating symbol that comes after `a' and before `b'; `ll' behaves similarly in Spanish locales, or `ij' in Dutch locales. Another common localization-related problem happens if your input stream includes invalid multibyte sequences. POSIX mandates that such sequences are _not_ matched by `.', so that `s/.*//' will not clear pattern space as you would expect. In fact, there is no way to clear sed's buffers in the middle of the script in most multibyte locales (including UTF-8 locales). For this reason, GNU sed provides a `z' command (for `zap') as an extension. However, to work around both of these problems, which may cause bugs in shell scripts, you can set the LC_ALL environment variable to `C', or set the locale on a more fine-grained basis with the other LC_* environment variables. Paolo