On Fri, Jan 9, 2015, at 18:08, FRIGN wrote: > > This is madness. If you want the bytes to be collated,
I don't see where you're getting that either of us want the bytes to be collated. I don't even know what you mean by "collated", since collating is not what tr does, except when ordering ranges. > you just write the > literal \50102. Even if octal values could be more than three digits, I have no idea what you think 50102 is. Its decimal value is 20546. Its hex value is 0x5042. I have no idea what it has to do with character U+00F6 whose UTF-8 representation is 0xC3 0xB6..... I just realized what you're doing, 0xC3B6 has the _decimal_ value 50102, I have no idea why you would think _that_ is a representation people would want to use. If you're so pro-unicode, make it accept \u00F6 - that's a valid extension. But reusing the syntax POSIX uses for three-digit octal literals, for arbitrarily long decimal literals that aren't even unicode code points, makes no sense at all. In what universe is that intuitive? > POSIX often is a solution to a problem that doesn't exist > in the first place when you just use UTF-8. > > > They have nothing to do with UTF-8. > > That's exactly the point. Collating elements are depending on the current > locale which is too much of a mess to deal with. Huh? > So when the Spanish "ll" collates before "m" and after "l" in a given > locale, we don't give a fuck. > So please give me the point why you are torturing me with this > information. Because collating elements are the thing POSIX forbids which you appear to have _misinterpreted_ as forbidding multibyte characters. Otherwise I have _no idea_ what in POSIX you interpret as preventing reasonable behavior with UTF-8 multibyte characters. > I stated that I did not implement collating elements into this tr(1) at > the beginning and that it's a POSIX-nightmare to do so, bringing harm > to anybody who is interested in a consistent, usable tool. tl;dr: Collating elements = POSIX forbids them = You don't want them anyway. Multibyte characters = POSIX allows/requires them = You like them too. What is the problem? I don't know what you want to do that you think POSIX doesn't allow.