Dov Feldstern wrote:
Abdelrazak Younes wrote:
Dov Feldstern wrote:
The re-ordering of characters should _exclusively_ be based
on the unicode range not on the language. Fixing the bidi algorithm to
do that should not be very difficult.
This is certainly an option, however I am far from convinced that this
is the correct behavior. I'd be happy to hear additional opinions...
(Georg?)
Abdel.
I feel I should explain *why* I'm always so adamant about claiming that
the explicit language information is important. Admittedly, the example
I'm about to bring is rather esoteric --- but then again, that's
precisely where the little annoyances arise, and after all, LaTeX is all
about getting things "just right", isn't it? And the example I'm
bringing is something that I've really wanted to do (and been able to do
with LyX, but not with regular word-processors), not totally fabricated...
So here it is (uppercase will represent an RTL language, lowercase --
LTR). First of all, if I were to print an LTR list, it would have the
form "xxx, yyy, zzz" (i.e., immediately to the right of each word comes
the comma, then a space, then the next word). An RTL list, on the other
hand, would have the form (visually) of: "ZZZ ,YYY ,XXX" (the first word
XXX is on the right, followed to the left of it by a comma and only then
a space, then the next word, and so on). Just to clarify the
differences, I'll print them in two rows (note especially the positions
of the commas):
LTR: xxx, yyy, zzz
RTL: ZZZ ,YYY ,XXX
So far so good. Now, let's say I want a list of RTL words in an overall
LTR sentence.There are situations in which *to me* it makes more sense
that the overall structure should remain LTR. In other words, I want
something that looks like this:
"this is the list: CBA, FED, IHG..."
--- note that each RTL word is of course in RTL order, but the order of
the words, as well as the positioning of the commas, is LTR. I challenge
you to try that in openoffice or MS-Word --- it's just not possible,
because the bidi algorithm decides that the entire string from A to G is
all RTL, and therefore renders it as
"IHG ,FED ,CBA"
, which is *not* what I want. (To be sure, you could actually get that
output even in OO/Word, basically by typing backwards --- in other
words, you have to mess up the *logical* order in order to get the
correct *visual* output; but I find that unacceptable, and if the list
is very long, it becomes unmanageable.) In LyX, I *can* do it without
typing everything backwards, precisely because I have the explicit
language mechanism.
I admit that it's debatable whether what I want is correct or not --- I
don't know if there are any rules about this --- but nonetheless, I want
to be able to do it that way if I choose to.
There are other similar cases --- but they are all similar in that they
revolve around the "ambiguous" characters --- punctuation marks, for
example, and perhaps digits --- which really are ambiguous in the sense
that they are not inherently either RTL or LTR. The bidi algorithms do a
good job of guessing what the user wants *usually*, but sometimes they
foul up; and it's not their fault --- it's a real ambiguity, which can
only be resolved by *explicitly* disambiguating it... In LyX, we already
have that built in, so it's a shame to throw that away...