Re: Internalization

Seak, Teng-Fong Fri, 17 Dec 1999 00:19:13 -0800
Shigeru Miyata wrote:

> > Actually, newer versions of XFree86 support already natively Unicode, so
> > do a lot of commercial X (if not all).
>
> It also introduces Big5 support.  Other multibyte encodings have been
> supported for quite some time.

     Yup, I know that.  I just mentioned Unicode because I wanted to
high-light its usage. :-)

> > So I suppose characters (eg in menus) are displayed using Unicode.
>
> What gain do we have?

     For LyX, Latin-1 languages and maybe even 8 bit encoded languages, I'm
afraid the gain is quite small.  One obvious advantage is that there's no
need to translate menu/message strings to Unicode from their "original"
encoding.  But for other language, the benefit should be greater.  In
traditional Chinese, Big5 isn't the unique encoding: there're EUC-TW and
XCN-11643-x and some extension created to suit daily usage in HK.  In one
word, it's a mess.  I don't know much about Japanese, but I know that
there're JIS, EUC-JP and Shift JIS.  It seems that one of them is used in
Windows while another one is for Unix/X, right?  Auto JIS is the automatic
recognition, right?  Could it do the job flawlessly?  Don't you find it
frustrated to have so many different encodings?  The same exists for other
languages like Greek and Russian in which one encoding is for Unix/X and the
other is for Windows.

     As there're a significant number of LyX users under Windows, some of
them might want to contribute to the translation too.  Imagine the trouble
they would encounter when some of them edit the translation file under Unix
while others under Windows.  A little carelessness on encoding will mess up
the whole translation file which is very big file.  Luckily, CVS might help
them to save some of their work, but certainly not all.

> > However, people still use Big5 as
> > traditional Chinese encoding.  Convesion from Big5 to and from Unicode
> > isn't quite straightforward as from Latin-1 to and from Unicode.  If the
> > translation is done in Big5, there will be a lot of unnecessary
> > conversion Big5 <-> Unicode.
>
> On the contrary.  In order to communicate with existing applications,
> people continue using traditional encodings as document languages.
> So if we use Unicode here, then we will have to have unnecessary
> conversions.

     I was talking about menu/message strings encoding.  So there shouldn't
be any problem because we can't cut and paste a menu item :-)

     For document encoding, you're surely correct.  Take Linux as example,
isn't it that the conversion is done by the kernel?  It seems to be so at
least for consoles since kernel uses Unicode internally.

> >                               That's why I ask if it's better to use
> > Unicode as the underlying encoding.  I take Chinese as an example, but
> > the argument can be very well applied to Japanese, Korean and any
> > language encodings other than Latin-1.
> [...]
> > And how are .po files saved?
>
> Any encoding you like as far as it is a superset of 7 bit ASCII.
> (utf-8, EUC, Big5, Shift-JIS, KS...)

     By "superset of 7bit ASCII", do you mean that every byte is 7 bit?  If
yes, UTF-7 should be used instead.  On the other hand, Big5 couldn't be used
because the encoding is 8 bit based.

> However, FYI, XForms cannot draw 16 bit character strings.

     I just remembered that in CLE (Chinese Linux Externsion) package, they
manage to display Chinese in LyX.  Or is it KLyX?

     Seak
Re: Internalization

Reply via email to