> Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin <b...@freebsd.org>: > > On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: >> I happened to run an old script today that uses sed(1) to extract the system >> boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works as >> expected: >> >> $ sysctl kern.boottime >> kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016 >> $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' >> v 5 16:18:34 2016 >> >> sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase >> apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as >> expected: >> >> $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' >> Nov 5 16:18:34 2016 >> >> Testing every lowercase character separately gives even more inconsistent >> results: >> >> $ cat <<! | LANG=en_US.UTF-8 sed -n -e '/^[A-Z]$/‚p
>> Here sed thinks every lowercase character except for 'a' is uppercase! This >> differs from the first test where sed did not think 'o' is uppercase. Again, >> the above behaves as expected with LANG=C. >> >> Does anyone have any insight into this? This is likely to break a lot of >> existing code. >> > > Yes A-Z only means uppercase in an ASCII only world in a unicode world it > means > AaBb... Z because there are way more characters that simple A-Z. In FreeBSD 11 > we have a unicode collation instead of falling back in on LC_COLLATE=C which > means ascii only > > For regrexp for example one should use the classes: :upper: or :lower:. That is rather surprising. Is there a normative reference for the treatment of bracket expressions and character classes when using locales other than C and/or encodings like UTF-8? Stefan -- Stefan Bethke <s...@lassitu.de> Fon +49 151 14070811 _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"