We need to look into how we shall make this happen.

IMHO there are three (semi-)ortogonal issues we have to tackle:
        1. Input (unicode input from gui)
        2. Output (write utf8 to file and parse it)
        3. Storage (Store the unicode chars in our paragraphs)

Of these 1 and 3 can be done without doing 2 as long as we continue to
convert to the document norm as it is now. (if the doc is latin1 we
read it as utf8 do a convertion to latin1 and store that internally)

Something similar can be done for gui input I guess. And must perhaps
be done anyway, depends on if we can get the gui to give us unicode
input regardless of how the locale is setup.

For 3 expanding the actual storage to 4 bytes (utf32/ucs4) is simple,
just a typedef. There are a lot of details that must be handled, but I
don't think we have to discover/list all of those before we begin.

I have created a branch for myself to begin working on this, if other
want to join in then I'll move the branch out of my personal dir. I
think the initial work should be done on a branch with semi-frequent
rebasing to current trunk. At first opportunity, when we have
something that mostly work we merge to trunk and continue from there.

Other musings on this subject?

-- 
        Lgb

Reply via email to