On Jun 25 18:09, Corinna Vinschen wrote: > On Jun 25 18:03, Corinna Vinschen wrote: > > On Jun 25 15:38, Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote: > > > > Your locale is zh_CN.UTF-8. What you're expecting is only guaranteed > > > > in the C locale: > > > [...] > Which also means, AFAICS, Cygwin's sed is doing it right, Linux' sed > is doing it wrong. Yes, that puzzles me a bit at the moment, too.
I had a discussion with my collegues from the Linux side of Red Hat. The bottom line is, we're both doing it right, just differently. As for the difference itself, here's what happened: The gawk maintainer was unhappy with how regex ranges worked when using locales other than the C locale. So he implemented a change to regex which he called "rational ranges". The idea being, that something like [b-d] always means lowercase only, [B-D] means uppercase only, independent of the locale we're in. This change to the regex handling not only made it into gawk(*), but also into glibc(**) and perl regex, but not into sed or bash, for instance. That's why sed under Cygwin shows the default, collation-abiding behaviour when using a non-C locale. Under Fedora 18 it shows the new "rational ranges" behaviour, because glibc supports them and sed has been built with the --without-included-regex option. I just checked the new upstream sed 4.2.2 (will upload shortly) and it still doesn't implement "rational ranges", even though its regex is derived from gnulib's regex. Corinna (*) Try echo abcdeABCDE | awk '{ gsub(/[B-D]/, "_"); print }' (**) http://sourceware.org/ml/libc-alpha/2012-12/msg00456.html -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple