Re: Unicode hindi?

Miyata Shigeru Wed, 09 May 2001 00:58:56 -0700
Aditya Gilra <[EMAIL PROTECTED]> wrote:

> Could you inform how long it is before lyx supports
> hindi/devanagari pages in unicode. I learn that
> unicode support is present but I couldn't see how
> in 1.1.6 . Do I need to get the development version
> for it.

It is unlikely to be supported unless someone with the knowledge
of the language/script join the development.

However, IMHO it is worthwhile to summarize possible problems to
deal with indic scripts, in order to prepare the design to accept
a patch for such kind of the support.

Correct me if I am wrong.
The main difficulty for handling indic/SE-Asian scripts lies in the
fact that each "character" as the users of those languages perceive
it consists of several Unicode characters.  An added difficulty,
except for in Thai/Lao, arises because these Unicode characters which
comprise a "character(=grapheme)" are represented as a string in
the pronunciation order, which is completely irrevalent of the
depicting order.  cf. Figure 2-3. of the Unicode Book version 3.
(In Thai and Lao, character strings are stored in the depicting
order, i.e., L to R.  Hence it is possible to render these languages
on screen with the current version of LyX if properly metrized
fonts are installed, although kernings and ligatures are ignored.)
Now 3 areas in the user interface must be addressed:
-Rendering on Screen
  This is a job of LyX.  First, characters in graphemes must be
  rearanged into the depicting order, and then kernings and ligatures
  must be resolved.  LyX already performs the character rearrangement
  for R to L languages (Hebrew and Arabic).  But I don't think it
  is practical to develop such a mechanism for all languages by
  ourselves.  Rather, a library developed by someone else should
  be used.  Fortunately there already are 2 internationalized layout
  engines, Pango and ICU.  They both perform their tasks pretty well.
  The problem here is that Pango is written in C and ICU is written
  in the stone age C++ as Lars said.
-Rendering on Printer
  This is a job of TeX.  For Devanagari, there is a preprocessor
  you can find at CTAN which rearranges character strings in TeX
  source files so that the rest of the job can be handled just fine
  by TeX compilers.  Notice Thai is already supported in LyX.
  Although Thai does not need rearrangement of characters in order
  for the source files to be processed with TeX compiler, it requires
  another kind of preprocessor which inserts indicaters at line
  breakable points.  So I bet Dekel's file format converter is
  already powerful enough to call a preprocessor automatically
  before running LaTeX.
-Editing
  "Grapheme should behave as units in terms of mouse selection,
  arrow key movement, backspacing and so on." (The Unicode Book
  Ch.5 section 15)  -- Well, in fact, for Thai/Lao, delete-forward
  should treat one grapheme as a unit, while delete-backward
  should treat one grapheme as a composite of multiple (Unicode)
  characters and delete them one by one. --
  The mechanism does not yet exist in LyX and we must consider
  how to implement it.
  A similar situation exists in the current CJK patch where text
  data are stored internally as variable length multibyte strings
  rather than wide character strings.  See
http://www.mail-archive.com/lyx-devel@lists.lyx.org/msg19155.html
  and my reply
http://www.mail-archive.com/lyx-devel@lists.lyx.org/msg19183.html

The Unicode Book online is at
http://www.unicode.org/unicode/uni2book/u2.html

Regards,
        SMiyata
Re: Unicode hindi?

Reply via email to