Charles de Miramon <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | | > Btw. Does Qt even support utf-16? I thought they only supported ucs-2. | > AFAICS there is not support for surrogate chars in Qt. | | Yes : | | utf16 : | http://doc.trolltech.com/4.2/qstring.html
Or no... afaik qstring does not support utf-16 surrogates so that chars outside the basic plane can be accessed. That mean that only a limited form of utf-16 is supported. (the one that is just called 'ucs-2') | surrogate : | | Qt documentation talks about it for grapheme cluster (id est 2 characters | resulting in one character on screen) : | | http://doc.trolltech.com/4.2/qtextlayout.html That is something different. (combining chars) Regard less of encoding unicode require you to combine characters to produce glyphs. | > | Would you consider using Qstring for storing unicode strings ? Qstring | > | is now part of QtCore a subset of Qt. | > | > I'd hope to not do that. | > | > Currently I am still exploring storing ucs-4 codepoints in the | > std::vector that contains the characters of the document. Also quite | > luckily codepoint conversion is quite fast. | > | | I hope you consider that if you leave LyX to sail the seven seas in your | yacht, another programmer will have more difficulty to pick up the project | if it is LarsGullikUnicodeLibrary than if it is Qstring (documented, book | about it, used in many projects), even if hacking LarsGullikUnicode Library | is certainly funnier than reusing boring stuff out off the shelf. I am not quite sure where you get these ideas from. QString is not a silver bullet. Using that would mean to change our internal storage completely. If QChar had been 32-bit wide, then we could have used ucs-4 easily as our internal codepoints. Luckily I think we are still able to do that. But as long as Qt only suport 16-bit chars, the unicode support in lyx (using qt) will be limited to the basic plane. So what I plan to do is to stor ucs-4 in our paragraph vector, when rendering transforms that in a frontend specific way to something the frontend can handle. For Qt this is ucs-2 strings, and use that to render. Chars/glyphs outside the basic plane will then have to be rendered with a '?'. But for gtk f.ex. that uses pango, we can handle the full unicode. (Since pango uses a ucs-4 unichar.) -- Lgb