Re: ICU - uneasy feeling

Lars Gullik Bjønnes Thu, 13 Oct 2005 15:08:40 -0700

Asger Ottar Alstrup <[EMAIL PROTECTED]> writes:

| Lars Gullik Bjønnes wrote:
| > No. I am not sure... but it depends... a combining character can be
| > used to produce accents as well... why not an umlaut on top of an
| > grave on top of an 'e'.
| 
| The reason I suggest a unicode inset is that we already have it: the
| latex accent inset.


That is rather different... you could only use it as a template.

| Of course you can start playing tricks with fancy underlying character
| types which are in fact composed of other things, but that is an
| astronaut design: it is overlayered, so many abstractions on top of
| abstractions, so many that you need as many complicated mechanisms to
| make it go fast - it is so far up in the sky that there is no oxygen
| left, and the brain stops to work.

I am not convinced.

| I would think it's best to just start with getting what we already
| have to work in a basic unicode setting, and maybe extend to a few
| eastern languages if volunteers come and help out. Don't worry about
| composed Unicode glyphs for now - it's a corner case that can be
| handled once someone feels the heat (which will probably when hell
| freezes over AFAICT).

Well... I claim that this is not uncommon at all.

I also claim that we can come quite close by using a UniChar

(basically struct UniChar { int32_t uni_char; })

And have this optimized for storage of single ucs-4 codepoints. But
also having the ability to stor n-codepoints.

| The big step that takes us 99.9% of the way is just going
| single-code-point Unicode.

yes... for western languages...
(that already can do fine with latin variants)

| Ligatures and other display headaches are handled by the toolkits
| these days, so don't loose sleep over those.

We have to do something... we have a cursor to position.

| The trick is to make the job as small and simple as possible, and
| single-code-point unicode is a huge, monotoneous improvement over the
| implicit 8-bit encodings used now, so why make the job harder? There
| is always another release after the next one.

This is the correct time to discuss problems and solutions and how
complete our first unicode version will be. It would be rather stupid
to not think of combining characters.
Just killing the discission is not good.

-- 
        Lgb

Re: ICU - uneasy feeling

Reply via email to