Charles de Miramon <[EMAIL PROTECTED]> writes: | Lars Gullik Bjønnes wrote: | | | > So what I plan to do is to stor ucs-4 in our paragraph vector, when | > rendering transforms that in a frontend specific way to something the | > frontend can handle. For Qt this is ucs-2 strings, and use that to | > render. Chars/glyphs outside the basic plane will then have to be | > rendered with a '?'. But for gtk f.ex. that uses pango, we can handle | > the full unicode. (Since pango uses a ucs-4 unichar.) | > | | Why do you want to store text in UTF-32 ? From what I understand from the | unicode FAQ, UTF-32 has a large memory cost for little benefit over UTF-16.
You forget the fun of surrogates. | http://www.unicode.org/unicode/faq/utf_bom.html | | Pango has been criticized for being a ressource and memory hog. That is pango. | One point made by Lars Knoll in his presentation is that the difficulty when | you go down the Unicode lane is not to degradate the performance If performance is the goal then ucs-4 is the solution. (Why? Because 32bit types is often a very fast type to access) | for 'normal' users too much. With ucs-4, if I'm correct, you multiply by 4 | the memory size of a LyX document, with ucs-2, you multiply by 2. larsbj 5756 1.9 2.0 117828 20808 pts/5 S+ 09:35 0:00 ./src/lyx-qt larsbj 5765 4.5 1.9 116936 19940 pts/4 S+ 09:36 0:00 ./src/lyx-qt3 Both of the above is a lyx qt3 just after UserGuide.lyx is loaded. One with 32bit chars stored and one with 8bit chars stored. The difference is some 0.7% (4.3%) | Why go above the Basic Multilingual Plane and therefore an ucs-2 encoding ? | Basic Multinlingual Plane, is not so basic. It covers all the languages | written today on our planet. You are of course aware of the fact that unicode now contains more than 90000 characters, and that the basic plane only covers two thirds of this? -- Lgb