Aditya Gilra <[EMAIL PROTECTED]> wrote:
> Could you inform how long it is before lyx supports
> hindi/devanagari pages in unicode. I learn that
> unicode support is present but I couldn't see how
> in 1.1.6 . Do I need to get the development version
> for it.
It is unlikely to be supported unless someone with the knowledge
of the language/script join the development.
However, IMHO it is worthwhile to summarize possible problems to
deal with indic scripts, in order to prepare the design to accept
a patch for such kind of the support.
Correct me if I am wrong.
The main difficulty for handling indic/SE-Asian scripts lies in the
fact that each "character" as the users of those languages perceive
it consists of several Unicode characters. An added difficulty,
except for in Thai/Lao, arises because these Unicode characters which
comprise a "character(=grapheme)" are represented as a string in
the pronunciation order, which is completely irrevalent of the
depicting order. cf. Figure 2-3. of the Unicode Book version 3.
(In Thai and Lao, character strings are stored in the depicting
order, i.e., L to R. Hence it is possible to render these languages
on screen with the current version of LyX if properly metrized
fonts are installed, although kernings and ligatures are ignored.)
Now 3 areas in the user interface must be addressed:
-Rendering on Screen
This is a job of LyX. First, characters in graphemes must be
rearanged into the depicting order, and then kernings and ligatures
must be resolved. LyX already performs the character rearrangement
for R to L languages (Hebrew and Arabic). But I don't think it
is practical to develop such a mechanism for all languages by
ourselves. Rather, a library developed by someone else should
be used. Fortunately there already are 2 internationalized layout
engines, Pango and ICU. They both perform their tasks pretty well.
The problem here is that Pango is written in C and ICU is written
in the stone age C++ as Lars said.
-Rendering on Printer
This is a job of TeX. For Devanagari, there is a preprocessor
you can find at CTAN which rearranges character strings in TeX
source files so that the rest of the job can be handled just fine
by TeX compilers. Notice Thai is already supported in LyX.
Although Thai does not need rearrangement of characters in order
for the source files to be processed with TeX compiler, it requires
another kind of preprocessor which inserts indicaters at line
breakable points. So I bet Dekel's file format converter is
already powerful enough to call a preprocessor automatically
before running LaTeX.
-Editing
"Grapheme should behave as units in terms of mouse selection,
arrow key movement, backspacing and so on." (The Unicode Book
Ch.5 section 15) -- Well, in fact, for Thai/Lao, delete-forward
should treat one grapheme as a unit, while delete-backward
should treat one grapheme as a composite of multiple (Unicode)
characters and delete them one by one. --
The mechanism does not yet exist in LyX and we must consider
how to implement it.
A similar situation exists in the current CJK patch where text
data are stored internally as variable length multibyte strings
rather than wide character strings. See
http://www.mail-archive.com/lyx-devel@lists.lyx.org/msg19155.html
and my reply
http://www.mail-archive.com/lyx-devel@lists.lyx.org/msg19183.html
The Unicode Book online is at
http://www.unicode.org/unicode/uni2book/u2.html
Regards,
SMiyata