On Sat, 10 Jan 2015 22:47:09 +0100 Markus Wichmann <nullp...@gmx.net> wrote:
> You wanted to be Unicode compatible, right? Because in that case I > expect [:alpha:] to be the class of all characters in General Category L > (that is, Lu, Ll, Lt, Lm, or Lo). That includes a few more characters > than just A-Z and a-z. And I don't see you add any other character to > that class later. Okay, to clear this up once and for all. Initially, I planned to just ignore the [:CLASS:]-blocks in the interest of a simpler implementation (If you go all the way, you end up with a complex and crufted POSIX-libc-mess). But Dimitris and Hiltjo rightfully criticized that we can't just break scripts that easily. So this was one motivation for a basic support to at least provide semi-consistent behaviour. I also take in regard that glibc is not the only libc around. toupper() only operates on ASCII anyway, so you can't work with that. > So, what I'm saying is, you can't have it both ways: Either you support > Unicode or not. That's true, but I never aimed for Unicode-support. I just in the initial sense support UTF-8, which allows mapping all Unicode characters. > I really don't see a way to achieve this without including a database of > sorts into tr itself. (...) If we had a variable > iterate from 1 to Unicode maximum and call iswalpha() for every one, > we'd get the set of all alphabetic characters. Can this work for us? Or we just stop worrying about it. The only reason why I added the raw classes is not to break scripts in a major way. I agree that A-Z is not sufficient to define [:upper:]. What I planned was to also include the greek and cyrillic alphabet with a number of accented characters. At the end of the day, we can be relaxed looking at how flexible this tr(1)-implementation is to allow these ideas. In 99% of the cases, A-Z is sufficient though. But for a better experience, I'll augment it as soon as I have put together some of my ideas. Thanks for your feedback! Cheers FRIGN -- FRIGN <d...@frign.de>