Charles de Miramon <[EMAIL PROTECTED]> writes:

| Lars Gullik Bjønnes wrote:
| 
| > Btw. Does Qt even support utf-16? I thought they only supported ucs-2.
| > AFAICS there is not support for surrogate chars in Qt.
| 
| Yes :
| 
| utf16 :
| http://doc.trolltech.com/4.2/qstring.html

Or no... afaik qstring does not support utf-16 surrogates so that
chars outside the basic plane can be accessed.

That mean that only a limited form of utf-16 is supported. (the one
that is just called 'ucs-2')
 
| surrogate :
| 
| Qt documentation talks about it for grapheme cluster (id est 2 characters
| resulting in one character on screen) :
| 
| http://doc.trolltech.com/4.2/qtextlayout.html

That is something different. (combining chars)
Regard less of encoding unicode require you to combine characters to
produce glyphs.

| > | Would you consider using Qstring for storing unicode strings ? Qstring
| > | is now part of QtCore a subset of Qt.
| > 
| > I'd hope to not do that.
| > 
| > Currently I am still exploring storing ucs-4 codepoints in the
| > std::vector that contains the characters of the document. Also quite
| > luckily codepoint conversion is quite fast.
| > 
| 
| I hope you consider that if you leave LyX to sail the seven seas in your
| yacht, another programmer will have more difficulty to pick up the project
| if it is LarsGullikUnicodeLibrary than if it is Qstring (documented, book
| about it, used in many projects), even if hacking LarsGullikUnicode Library
| is certainly funnier than reusing boring stuff out off the shelf.

I am not quite sure where you get these ideas from.

QString is not a silver bullet. Using that would mean to change our
internal storage completely.

If QChar had been 32-bit wide, then we could have used ucs-4 easily as
our internal codepoints. Luckily I think we are still able to do that.
But as long as Qt only suport 16-bit chars, the unicode support in lyx
(using qt) will be limited to the basic plane.

So what I plan to do is to stor ucs-4 in our paragraph vector, when
rendering transforms that in a frontend specific way to something the
frontend can handle. For Qt this is ucs-2 strings, and use that to
render. Chars/glyphs outside the basic plane will then have to be
rendered with a '?'. But for gtk f.ex. that uses pango, we can handle
the full unicode. (Since pango uses a ucs-4 unichar.)

-- 
        Lgb

Reply via email to