Re: Hebrew encoding (cp1255)

Abdelrazak Younes Wed, 03 Jan 2007 00:06:55 -0800

Dov Feldstern wrote:

Abdelrazak Younes wrote:
It's not so bad, really, when you think about it. If I'm a user who usesa Bidi language, then I know when I first install LyX, to set RTL totrue, and I'm done --- never have to think about it again.

I disagree here, even a normally non RTL user should be able tocorrectly visualize a document containing some RTL text. But I agreewith what you say below:

OTOH, I guess it's really not so bad to just do the calculation all thetime. I mean, I have this switch on, and I don't feel that it slows medown or anything. Maybe the option could be reversed --- by default,have RTL support on, and allow users who happen to be working on reallyslow machines and who do not use RTL languages to turn it off. Foreveryone else, it probably doesn't really make a difference. (I guessthat profiling could be used to see how costly it really is.)
Also --- not that this is necessarily any proof that it's good ---OpenOffice has the same kind of setting, somewhere.

So, if we don't manage to optimize the Bidi table calculation enough, wecan do this indeed: enable RTL by default and provide a way to disableit in some performance settings dialog.


[...]

I am sure that we can intelligently avoid the calculation in anautomated way by looking at the unicode code-point. That's actuallythe big point of unicode.
I don't understand what you mean. In order to determine if there's anyRTL letter, I have to look at every letter in the paragraph. Which Idon't have to do if I know ahead of time that there's no RTL languagethere. I think that that's the time saved.

No, the time is saved when you don't have to compute the lookup tables.In pre-unicode time (LyX < 1.5), we used a single byte for any languageincluding Hebrew and Arabic (and sometimes two chars for Arabic forspecial composite shaping). We then need to know in advance the languageof the paragraph. Per letter distinction was not possible because aletter could be interpreted differently depending of the encoding.

Unicode only helps me onceI'm looking at the letter. Am I missing something?

I think so. With Unicode, the character set is fixed and we have a rangereserved for Hebrew letters and another range reserved for Arabicletters. Testing for Arabic is as simple as that:


bool is_arabic(char_type c)
{
        return c >= 0x0600 /*1536*/ && c <= 0x06FF /*1791*/;
}

Abdel.

Re: Hebrew encoding (cp1255)

Reply via email to