Woohoo! Cool, and thanks very much.
No problem. I can't find someone to come on-board yet, but I did get an answer to your question.
If he's up for it, could you ask him a question? Namely "Treating all text as Unicode--good idea or bad idea?" If the answer's going to be a lot of work you can skip it, that's OK.
The answer is fairly straight-forward, fortunately.
Talking to Burnhard and perky on HanIRC, I was able to get the following information:
- there are (of course) some character sets that don't work well with Unicode -- for example, Big5HKSCS doesn't encode in UCS2 (though I didn't find out why)
- that being said, the consensus was that internal storage as Unicode is a good idea for modern programming languages and APIs.
- Tcl/Tk's method of per-FH filters for EUC, johab, etc. seems to be useful and well-received.
So in essence, what I got from the conversation was that internal storage as Unicode is a good thing (and indeed, expected), so long as a method for conversion on input/output is provided.
Sorry if that doesn't answer all the nuances of the question, but that's the best I can do for now.
Cheers,
~kj