Asger Alstrup Nielsen <[EMAIL PROTECTED]> wrote:
> Since we only need to convert the variable width encoding when we get
> input from the keyboard, the memory overhead of using the first option
> is insignificant. And the argument is similar when we read from a
> file: We read a line at a time, and convert that, so the memory
> overhead is minimal.
You forget paste from X clipboard with "COMPOUND_TEXT" property.
> Step 1: This step is done line by line (or key by key), so the memory
> overhead of switching to 16-bit LString even if the file is 8-bit is
> insignificant.
>
> Step 2: This step is done by the encoding classes, and converts the
> potentially variable width encoding to a fixed width encoding that is
> used as the document representation.
> Optimally, this conversion will use Unicode as the middlelayer: First
> convert to Unicode, and then to the appropriate fixed width encoding.
> Notice that it's entirely possible to skip the Unicode middle layer, if
> a more effective encoding converter is written. This step is not
> performance critical.
Unless the encoding of the file is different from the document encoding,
the Unicode middle layer must be skipped. It is too inefficient. Besides,
the current design of Encoding class is for fixed width <-> fixed width
conversion. In the variable width <-> fixed width conversion, you have
to consider "the shift state characters" in addition to the leading byte
indicators' range. It is quite complex and must be optimized without
any intermediate layers.
> Step 3: This step is done at display time. We need to convert from the
> fixed width enconding used for the document representation to the
> encoding the font renderer uses. Notice that we chose the document
> representation exactly the way we want in order to optimize this
> process.
> In particular, we do not need to go over the Unicode middle layer to
> perform this conversion. So in practice, this can be as effective as
> possible within the constraint that our document representation is fixed
> width.
This is unneccesary. The font encoding *is* wide character based,
provided we use XFontSet. XwcDrawText() must be more efficient than
XmbDrawText(), since X needs not do the conversion at display time. The
conversion should have been made when XFontSet is created. If you stick
to XFontStruct and XDrawString16() the situation differ since *we* are
the one to select among fonts which are required to render a font set
for the language.
> Step 4: This step is to come from LString to the C string the font
> renderer wants to use. If we use an 8-bit font renderer, but a wide
> character LString, we have a performance bottleneck here, because we
> have to down-copy the text to an 8-bit char array. This is the primary
> reason why we choose to make LString compile time variable sized.
> However, if our X server can handle a wide string, this step is constant
> time.
>
> The key insight is that the encoding conversions do not *have* to use
> the Unicode as the middle layer. If it is considered to be too slow, we
> can omit this step with a hand-tuned converter.
>
> The toUnicode and fromUnicode methods are not the primary encoding
> conversion routines. They are used internally in order to organize
> things, and increase code reuse. (And for the Unicode inset, as
> described in the design document.)
In my understanding, the toUnicode and fromUnicode methods are only for
the Unicode inset and for the character level operations for poorly
supported languages.
Regards,
SMiyata