On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: > I happened to run an old script today that uses sed(1) to extract the system > boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works as > expected: > > $ sysctl kern.boottime > kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016 > $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' > v 5 16:18:34 2016 > > sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase > apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as > expected: > > $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' > Nov 5 16:18:34 2016 > > Testing every lowercase character separately gives even more inconsistent > results: > > $ cat <<! | LANG=en_US.UTF-8 sed -n -e '/^[A-Z]$/'p > > a > > b > > c > > d > > e > > f > > g > > h > > i > > j > > k > > l > > m > > n > > o > > p > > q > > r > > s > > t > > u > > v > > w > > x > > y > > z > > ! > b > c > d > e > f > g > h > i > j > k > l > m > n > o > p > q > r > s > t > u > v > w > x > y > z > > Here sed thinks every lowercase character except for 'a' is uppercase! This > differs from the first test where sed did not think 'o' is uppercase. Again, > the above behaves as expected with LANG=C. > > Does anyone have any insight into this? This is likely to break a lot of > existing code. >
Yes A-Z only means uppercase in an ASCII only world in a unicode world it means AaBb... Z because there are way more characters that simple A-Z. In FreeBSD 11 we have a unicode collation instead of falling back in on LC_COLLATE=C which means ascii only For regrexp for example one should use the classes: :upper: or :lower:. Best regards, Bapt
signature.asc
Description: PGP signature