Angus Leeming <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | > I have been trying to look at the ICU api, but I find the | > documentation utterly confusing and hard to get a clear understanding | > on how it works. (Probably caused be me not finding a "Hello World" | > code snippet) | > | > Also, I must say, some of this is based on really old (before 2000) | > ideas on how to write portable C++. And at least to me some of this | > feel antiquated. Makes me feel a bit uneasy. | > | > Would be nice if some of you could have a look at this lib as well, | > and see what you think of it. I know it is _The_ Unicode lib to use, | > but still... | | Perhaps it would help us all if you tried to map out exactly where you | propose to use ICU? If I understand correctly, we'd have: | | LyX file format in UTF-8 --ICU--> data in USC-4 encoded std::wstrings | | data in memory --ICU--> LyX file | | data in memory --ICU--> Multibyte char strings used by XForms | | data in memory --> QString, no conversion needed? | | If that's correct, then effectively LyX doesn't see the unicode at all. It | continues to operate on a char_type-based string, where char_type is now | wchar_t, or uint or whatever is portable.
won't really work well... | Again, if that's correct, then I don't think that the ICU interface matters | at all. Wrap it up in your own interface and leave ICU as an | implementation detail. To have more than trivilal support for Unicode we will have to use ICU all over. ICU types for storage in memory, (single code points) which we must change into Unicode strings before display (if not surrogates and combining chars will not work) and transformation to glyphs. In buffer we will have to store either UChar (UTF-16 characters) or UChar32 (UCS-4/UTF-32), but before display (and thus paragraph breaking) this must be changed into UnicodeString (ICU uses UTF-16 for strings), to get the correct number of glyphs and the correct width. (cursor positioning will be fun...) (Even UCS-4 is not "one-codepoint" "one-glyph", combining chars are required for proper display) -- Lgb