Re: sed command does not behave equal from 10.3 to 11.0

Kimmo Paasiala Wed, 27 Jul 2016 05:34:54 -0700

On Wed, Jul 27, 2016 at 2:55 PM, Tomoaki AOKI <[email protected]> wrote:
> Hi.
>
> There were some collation related changes (*1) between 10.3 and 11.
> So the results can be changed even with the same locale.
>
> *1: For example, r302512.
>   https://lists.freebsd.org/pipermail/svn-src-head/2016-July/088919.html
>
> But I cannot understand why ASCII range of characters are affected with
> UTF-8 encoding.
>
>
> On Wed, 27 Jul 2016 11:19:06 +0200
> Jos〓 Garc〓a Juanino <[email protected]> wrote:
>
>> On 27 July 2016 at 11:01, Matthew D. Fuller <[email protected]> wrote:
>> > On Wed, Jul 27, 2016 at 09:45:23AM +0100 I heard the voice of
>> > krad, and lo! it spake thus:
>> >> are you sure you aren't hitting a port or something?
>> >
>> > Locale dependant.
>> >
>> > % echo "abc_ABC.def" | env LANG=C sed -e 's/[^A-Z0-9]//g'
>> > ABC
>> >
>> > % echo "abc_ABC.def" | env LANG=en_US.UTF-8 sed -e 's/[^A-Z0-9]//g'
>> > bcABCdef
>> >
>> > (pre-branch -CURRENT)
>> >
>>
>> The issue is that, under the same locale, the output is not the same
>> in 10.3 as 11.0. It sounds to me a bug ...
>> _______________________________________________
>> [email protected] mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "[email protected]"
>>
>
>
> --
> Tomoaki AOKI    [email protected]
> _______________________________________________


If I change the invocation to this I get the correct output:

% echo "abc_ABC.def" | env LANG=en_US.UTF-8 sed -e 's/[^ABC]//g'

Is the real problem that the UTF-8 locale messes up character ranges
(e.g. A-Z) in sed(1)?

-Kimmo
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Re: sed command does not behave equal from 10.3 to 11.0

Reply via email to