Aaron,
I happen to agree with Dan about the unwieldiness of replacing characters with their full names during character translation, but your idea of using Unicode equivalents seems more palatable. I'm going to ignore the issue of how this method of handling errors fits into the scheme of possible error-handling methods, for the moment, because I want to talk about that in a separate email. Having said that, I have a few specific questions about some of your design choices. It's late and I'm tired, so I'm probably a bit incoherent. If you have trouble understanding me, let me know and I'll try to clarify. So, first:


How are you going to choose between different canonical compositions and compatability compositions when such a choice has to be made? For example, when encoding combining characters, vertically oriented text, or Korean jamo vs. syllables, how will you pick between the four different normalization forms?

If a transparent conversion is required to get a string into Unicode before transforming out of it, do we print the source character or its Unicode equivalent if an error occurs?

Michael

Reply via email to