We need to look into how we shall make this happen. IMHO there are three (semi-)ortogonal issues we have to tackle: 1. Input (unicode input from gui) 2. Output (write utf8 to file and parse it) 3. Storage (Store the unicode chars in our paragraphs)
Of these 1 and 3 can be done without doing 2 as long as we continue to convert to the document norm as it is now. (if the doc is latin1 we read it as utf8 do a convertion to latin1 and store that internally) Something similar can be done for gui input I guess. And must perhaps be done anyway, depends on if we can get the gui to give us unicode input regardless of how the locale is setup. For 3 expanding the actual storage to 4 bytes (utf32/ucs4) is simple, just a typedef. There are a lot of details that must be handled, but I don't think we have to discover/list all of those before we begin. I have created a branch for myself to begin working on this, if other want to join in then I'll move the branch out of my personal dir. I think the initial work should be done on a branch with semi-frequent rebasing to current trunk. At first opportunity, when we have something that mostly work we merge to trunk and continue from there. Other musings on this subject? -- Lgb