Re: Hebrew encoding (cp1255)

Dov Feldstern Sun, 31 Dec 2006 13:05:55 -0800

Georg Baum wrote:

On Saturday 30 December 2006 21:26, Dov Feldstern wrote:
I'm sorry for not being clear. You're right --- using the file
heb142-default.lyx, there is a problem with the display, but the
generated latex is correct. But the reason (for both of those things) is
that the lyx file was originally created with 1.4.2 --- so it's not a
unicode file at all!
No, that is not the reason. As you probably know LyX converts old files withlyx2lyx to the current format. This conversion works perfectly for the fileheb142-default.lyx. It does also work perfectly for files with fixed encodingsuch as cp1255. It does not work for files with multiple encodings(http://bugzilla.lyx.org/show_bug.cgi?id=3049) , but I sent a patch yesterdaythat fixes this bug. With this patch you should not have any file formatrelated bugs anymore (at least I don't know any).
(And I guess this also implies a problem withconversion of files from 1.4.X to 1.5.0?),
No, see above.
So on the one hand, 1.5doesn't know how to display the characters anymore (I'm not sure exactly
why, though);
But I know: Since we don't know how LaTeX interprets the "default" encoding(this depends on many things) we treat it internally (and in the lyx2lyxconversion) as latin1. Therefore lyx2lyx interprets files in this encoding aslatin1 and converts that to unicode. Since in your case the encoding wasactually interpreted as cp1255 the unicode characters in LyX are wrong.Hoever, the generated output is OK, since the wrong unicode charcters reconverted to latin1. This is then interpreted as cp1255 by LaTeX, and you getthe correct DVI output. I guess that LyX 1.4 displayed the characterscorrectly, because it used the language to determine the encoding. If that istrue we can do the same in 1.5, but I am not sure yet.

Georg, you win. I stand corrected. And thanks for the detailedexplanation, I think I understand the situation now.

So it seems to me that what we really want (for Hebrew, at least) is forLyX (and lyx2lyx) to determine the encoding based on the language, ifthe encoding is set to "default" (and maybe also "auto"?). I understand,however, that that may not be the right thing for other languages...

If you can point me to where in the code this is happening, I'd bewilling to take a shot at trying to patch it up. I keep asking you tofix things, but I'm willing to try and help, too...

But here's where the second problem arises, and this time it's LyX's
problem, not latex's (though I'm less sure about this part): it seems
to me like LyX itself --- not only latex --- is also determining the
encoding based on the paragraph, rather than based on the individual
characters' language.


Yes. It is implemented like that because of the limitation of older
inputenc packages.


There's no real reason why LyX should limit itself just because latex
does. Here exactly is an example where latex will manage, if only LyX
would.


I don't think that latex would manage, but I'll create a test patch so
that we can try out.


The reason I say that latex will manage, is that the generated latex
file should look exactly the same as the file generated by 1.4.2, which
does work...


I don't understand. 1.4 has the same inputenc limitation, or is that not true?

Yes, 1.4 does have the same inputenc limitation. That's why we use"default" encoding instead of "auto". "Auto" uses inputenc, andexplicitly states the encoding of each paragraph. But "default" doesn'tstate anything about encodings at all, and doesn't use inputenc; andlatex just manages, I guess because it uses the language in order todetermine the encoding, and the language *can* change in the middle of aparagraph.

Dov

Re: Hebrew encoding (cp1255)

Reply via email to