On Fri, Jan 9, 2015, at 18:08, FRIGN wrote:
> 
> This is madness. If you want the bytes to be collated,

I don't see where you're getting that either of us want the bytes to be
collated. I don't even know what you mean by "collated", since collating
is not what tr does, except when ordering ranges.

> you just write the
> literal \50102. 

Even if octal values could be more than three digits, I have no idea
what you think 50102 is. Its decimal value is 20546. Its hex value is
0x5042. I have no idea what it has to do with character U+00F6 whose
UTF-8 representation is 0xC3 0xB6..... I just realized what you're
doing, 0xC3B6 has the _decimal_ value 50102, I have no idea why you
would think _that_ is a representation people would want to use. If
you're so pro-unicode, make it accept \u00F6 - that's a valid extension.
But reusing the syntax POSIX uses for three-digit octal literals, for
arbitrarily long decimal literals that aren't even unicode code points,
makes no sense at all. In what universe is that intuitive?

> POSIX often is a solution to a problem that doesn't exist
> in the first place when you just use UTF-8.
> 
> > They have nothing to do with UTF-8.
> 
> That's exactly the point. Collating elements are depending on the current
> locale which is too much of a mess to deal with.

Huh?

> So when the Spanish "ll" collates before "m" and after "l" in a given
> locale, we don't give a fuck.
> So please give me the point why you are torturing me with this
> information.

Because collating elements are the thing POSIX forbids which you appear
to have _misinterpreted_ as forbidding multibyte characters. Otherwise I
have _no idea_ what in POSIX you interpret as preventing reasonable
behavior with UTF-8 multibyte characters.

> I stated that I did not implement collating elements into this tr(1) at
> the beginning and that it's a POSIX-nightmare to do so, bringing harm
> to anybody who is interested in a consistent, usable tool.

tl;dr:

Collating elements = POSIX forbids them = You don't want them anyway.
Multibyte characters = POSIX allows/requires them = You like them too.
What is the problem?
I don't know what you want to do that you think POSIX doesn't allow.

Reply via email to