After read the source code of Encoding class, I have kind of confuses.

(1)
In the method convertToUnicode

    std::pair<wstring, LString::const_iterator>
    EncTW_Big5::convertToUnicode(LString const & s) {

we get a 'LString' and return and 'wstring'. According to document,
LString is used
to represent document itself. However, I suppose convertToUnicode should
be used
when we read a byte stream from file or keyboard input (or input server
of X) which is
variable length encoding (such as UTF8, BIG5, SJIS.....). The type of
input string is
LString instead of 'char *'. Because LString could be one byte (char *)
or wide character
(wchar_t) according to the compile time option, how should we put
variable length encoding
in LString? There's two possibility. Firstly, we can put one byte of
input stream in a single
LChar no matter it's one-byte or two bye. Secondly, we can put one byte
or two bytes of input
stream in a single LChar according to the encoding. It's a little bit
wired in my statement. I'll
explain it by an example. For example, the byte stream is
    0x40                         0xa1 0x40
     first character           second character
Suppose that we use wchar_t(16 bit). In the first method, the LString is

    0040,00a1,0040  ===> 48 bit
in the latter method, it's
    0040,a140          ====> 32bit

It seems that the second make more sense, but it means that LString must
know the encoding
detail. This is not what we want because all endoing detail should not
be dealed with outside the
Encoding class.

If we use first method, it wastes memory when we use 16bit character.
Does it seems that we have
no good solution? Why? My answer is that we shouldn't use LString here.
We should use byte stream.
I mean 'char *' here. The dataflow inside LyX is

        read system call                      convertToUnicode
             LyX algorithms
file ===============> char
*==================>LString================>LString

The disadvantage of this approach is that we need convert Unicode to
screen font encoding when
we want to display. It's a true disadvantage. Therefore, another
possibility is that

       read system call                      convertToLString
                LyX algorithms
file
================>char*==================>LString================>LString

In this approach, we don't need Unicode anymore and the LString could be
send to X function call
directly because X support fix byte encoding for a lot of one to two
bytes encoding.

Therefore, in my point of view, the design of Encoding seems that
doesn't fill the requirement? I hope
I'm wrong.

Yu-Chung Wang

Reply via email to