Re: Even more Hebrew / Bidi / Encoding Woes (2/2)

Dov Feldstern Sat, 07 Apr 2007 12:08:02 -0700

Dov Feldstern wrote:

Abdelrazak Younes wrote:
Dov Feldstern wrote:
The re-ordering of characters should _exclusively_ be basedon the unicode range not on the language. Fixing the bidi algorithm todo that should not be very difficult.
This is certainly an option, however I am far from convinced that thisis the correct behavior. I'd be happy to hear additional opinions...(Georg?)
Abdel.

I feel I should explain *why* I'm always so adamant about claiming thatthe explicit language information is important. Admittedly, the exampleI'm about to bring is rather esoteric --- but then again, that'sprecisely where the little annoyances arise, and after all, LaTeX is allabout getting things "just right", isn't it? And the example I'mbringing is something that I've really wanted to do (and been able to dowith LyX, but not with regular word-processors), not totally fabricated...

So here it is (uppercase will represent an RTL language, lowercase --LTR). First of all, if I were to print an LTR list, it would have theform "xxx, yyy, zzz" (i.e., immediately to the right of each word comesthe comma, then a space, then the next word). An RTL list, on the otherhand, would have the form (visually) of: "ZZZ ,YYY ,XXX" (the first wordXXX is on the right, followed to the left of it by a comma and only thena space, then the next word, and so on). Just to clarify thedifferences, I'll print them in two rows (note especially the positionsof the commas):


LTR: xxx, yyy, zzz
RTL: ZZZ ,YYY ,XXX

So far so good. Now, let's say I want a list of RTL words in an overallLTR sentence.There are situations in which *to me* it makes more sensethat the overall structure should remain LTR. In other words, I wantsomething that looks like this:


"this is the list: CBA, FED, IHG..."

--- note that each RTL word is of course in RTL order, but the order ofthe words, as well as the positioning of the commas, is LTR. I challengeyou to try that in openoffice or MS-Word --- it's just not possible,because the bidi algorithm decides that the entire string from A to G isall RTL, and therefore renders it as


"IHG ,FED ,CBA"

, which is *not* what I want. (To be sure, you could actually get thatoutput even in OO/Word, basically by typing backwards --- in otherwords, you have to mess up the *logical* order in order to get thecorrect *visual* output; but I find that unacceptable, and if the listis very long, it becomes unmanageable.) In LyX, I *can* do it withouttyping everything backwards, precisely because I have the explicitlanguage mechanism.

I admit that it's debatable whether what I want is correct or not --- Idon't know if there are any rules about this --- but nonetheless, I wantto be able to do it that way if I choose to.

There are other similar cases --- but they are all similar in that theyrevolve around the "ambiguous" characters --- punctuation marks, forexample, and perhaps digits --- which really are ambiguous in the sensethat they are not inherently either RTL or LTR. The bidi algorithms do agood job of guessing what the user wants *usually*, but sometimes theyfoul up; and it's not their fault --- it's a real ambiguity, which canonly be resolved by *explicitly* disambiguating it... In LyX, we alreadyhave that built in, so it's a shame to throw that away...

Re: Even more Hebrew / Bidi / Encoding Woes (2/2)

Reply via email to