On czwartek 19 grudzień 2002 12:31 pm, Kuba Ober wrote:
> On czwartek 19 grudzień 2002 02:34 am, you wrote:
> > On Wed, Dec 18, 2002 at 11:52:10AM -0500, Kuba Ober wrote:
> >
> > As far as I know there are already more than 2^16 chinese characters if
> > all historic variants are taken into account. 2^16 gets tight if
> > artificial scripts like Klingon are included. If one starts again with
> > "code pages" and similar, all the old cruft is back. 32 bits is the way
> > to go...
> >
So it seems in 99% cases, but the remaining 1% or more is still not trivial. 
Read on.
> >
> > > The reason I'm asking is that it would really make QString <->
> > > basic_string<uint16_t> conversion very quick, and Lars would be happier
> > > not having to QString'ify whole LyX, methinks.
> >
> > This certainly won't happen. Apart from that why do you believe that a
> > quick conversion is necessary? Did this item show up in some profiler?
>
> No. It's just that faster code means less code in that situation, and don't
> we all like less code? As in: the best patch is the one which only removes
> code while preserving functionality ;-)

But it's probably very true that just using a 32-bit encoding with *mostly* 
one-to-one mapping between characters and dwords is easy to use. But not 
always easy to use.

I'm always wondering this:
- for arabic languages, at least, there is no one-to-one relationship between 
ucs4 dwords and character spaces / unique cursor positions -- so that benefit 
of ucs4 in the general case is dead gone methinks
- since that essentially breaks the simple "full information about one 
character/composite glyph per 32 bits" assumption, one could as well go with 
utf8, right?

It seems that utf8 is still more compact than 32 bits per unicode table entry, 
in the worst case. Right? And it's very compact in case of english text. And 
since we cannot have a global "one glyph/character space per 32 bits" 
assumption, there seems to be no reason why not to use more compact utf8 
encoding with about the same amount of processing as required with ucs4.

Cheers, Kuba Ober

Reply via email to