Jutta, this is getting really interesting from the UTF-8 point of
view. Thankyou for your findings. :)
On 07/08/2005, at 7:12 AM, Jutta Wrage wrote:
The following order is correct , but the final display on any
document
has mistakes.
Maybe an new discovery; I made, helps here:
I visited http://www.linux-india.org/ with two different browsers.
The page is broken, but the importand thing can be seen:
The third line on the right has one of the vovel signs in question
applied, too.
I looked the page with two OSX apps, but that should be no problem.
- - Omniweb shows the f-like character behind the first letter.
- - Safari shows it in front of the first letter.
So it must be more an application problem. If I paste it to a mail,
both of them look equal:
लिनक्स-इन्डिया में
आपका स्वागत है।
I see the f-like thingy at the beginning of the line here, when
writing the mail. That is like Safari shows it and not like Omniweb
shows it.
I think, the same might happen for different X applications.
The problem now is to find out, where the vovel sign should be
placed. Then one can file bugs. ;-)
The really interesting thing here is that this eliminates the
decomposed/precomposed Unicode bug in this case: both Safari and
OmniWeb are Cocoa applications, and thus will display both decomposed
and precomposed Unicode appropriately.
I need to test this more with svashka's languages, though, although
they have the same combined-diacritic issues that mine does.
Undoubtedly she should be using a precomposed layout, and i really
wonder if the charmap _is_ a precomposed layout, since the position
of diacritics varies in different apps, and that tends to be an
artifact of decomposed input, where the character is not input as one
whole character, but the vowel and accents are input separately, and
thus can become separated during display, and even (in my becoming-
famous case) have unanchored accents chase the cursor around the page!
Danilo Šegan of Gnome-i18n has come through with some excellent info:
http://indlinux.org/ sounds like a good starting point.
This is wonderful: it looks like they have an entire Linux distro for
Indic languages.
There is also linux-utf8 list, and xkeyboard-config list for
development of keyboard layout maps.
However, if input of Sanskrit and Marathi required "input method"
support, you might want to look into different input method mechanisms
(XIM, Gtk+ IM,...).
Thankyou for any help you can offer with this. Is there a list where
one should discuss Unicode input and display?
Not a centralised one, no, but linux-utf8 sounds like a good starting
point:
http://mail.nl.linux.org/linux-utf8/
and this list looks like an excellent resource for these sorts of
problems.
We'll be collecting the info sorted out through the investigation,
and post a summary here as well as on the appropriate i18n lists,
since this was originally a D-W translator enquiry. :)
Please continue to contribute your experience in this area: all
Unicode Level 1->2 and combined diacritics experience, in particular,
is very welcome.
Jutta, are you a member of the omniweb-l list? If not, I'll post
there as well, and ask about the varying display. It's a particularly
useful clue... and the Omniweb people are always very responsive and
skilled in these issues.
from Clytie (vi-VN, Vietnamese free-software translation team / nhóm
Việt hóa phần mềm tự do)
http://groups-beta.google.com/group/vi-VN