2016-11-06 12:07, Baptiste Daroussin wrote:
Yes A-Z only means uppercase in an ASCII only world in a unicode world
it means
AaBb... Z because there are way more characters that simple A-Z. In
FreeBSD 11
we have a unicode collation instead of falling back in on LC_COLLATE=C
which
means ascii only
For regrexp for example one should use the classes: :upper: or :lower:.
It is a good idea to keep LC_COLLATE and LC_NUMERIC (and LC_MONETARY?)
at "C"
when LANG or LC_CTYPE is set to something else, otherwise unexpected
things may happen.
Mark
On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote:
I happened to run an old script today that uses sed(1) to extract the
system
boot time from the kern.boottime sysctl MIB. On 11.0 this no longer
works as
expected:
$ sysctl kern.boottime
kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34
2016
$ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/'
v 5 16:18:34 2016
sed passes over 'S' and 'N' until it hits 'v', which it considers
uppercase
apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works
as
expected:
$ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/'
Nov 5 16:18:34 2016
Testing every lowercase character separately gives even more
inconsistent
results:
$ cat <<! | LANG=en_US.UTF-8 sed -n -e '/^[A-Z]$/'p
> a
> b
> c
> d
> e
> f
> g
> h
> i
> j
> k
> l
> m
> n
> o
> p
> q
> r
> s
> t
> u
> v
> w
> x
> y
> z
> !
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
Here sed thinks every lowercase character except for 'a' is uppercase!
This
differs from the first test where sed did not think 'o' is uppercase.
Again,
the above behaves as expected with LANG=C.
Does anyone have any insight into this? This is likely to break a lot
of
existing code.
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"