Georg Baum wrote:
On Saturday 30 December 2006 21:26, Dov Feldstern wrote:
I'm sorry for not being clear. You're right --- using the file
heb142-default.lyx, there is a problem with the display, but the
generated latex is correct. But the reason (for both of those things) is
that the lyx file was originally created with 1.4.2 --- so it's not a
unicode file at all!
No, that is not the reason. As you probably know LyX converts old files with
lyx2lyx to the current format. This conversion works perfectly for the file
heb142-default.lyx. It does also work perfectly for files with fixed encoding
such as cp1255. It does not work for files with multiple encodings
(http://bugzilla.lyx.org/show_bug.cgi?id=3049) , but I sent a patch yesterday
that fixes this bug. With this patch you should not have any file format
related bugs anymore (at least I don't know any).
(And I guess this also implies a problem with
conversion of files from 1.4.X to 1.5.0?),
No, see above.
So on the one hand, 1.5
doesn't know how to display the characters anymore (I'm not sure exactly
why, though);
But I know: Since we don't know how LaTeX interprets the "default" encoding
(this depends on many things) we treat it internally (and in the lyx2lyx
conversion) as latin1. Therefore lyx2lyx interprets files in this encoding as
latin1 and converts that to unicode. Since in your case the encoding was
actually interpreted as cp1255 the unicode characters in LyX are wrong.
Hoever, the generated output is OK, since the wrong unicode charcters re
converted to latin1. This is then interpreted as cp1255 by LaTeX, and you get
the correct DVI output. I guess that LyX 1.4 displayed the characters
correctly, because it used the language to determine the encoding. If that is
true we can do the same in 1.5, but I am not sure yet.
Georg, you win. I stand corrected. And thanks for the detailed
explanation, I think I understand the situation now.
So it seems to me that what we really want (for Hebrew, at least) is for
LyX (and lyx2lyx) to determine the encoding based on the language, if
the encoding is set to "default" (and maybe also "auto"?). I understand,
however, that that may not be the right thing for other languages...
If you can point me to where in the code this is happening, I'd be
willing to take a shot at trying to patch it up. I keep asking you to
fix things, but I'm willing to try and help, too...
But here's where the second problem arises, and this time it's LyX's
problem, not latex's (though I'm less sure about this part): it seems
to me like LyX itself --- not only latex --- is also determining the
encoding based on the paragraph, rather than based on the individual
characters' language.
Yes. It is implemented like that because of the limitation of older
inputenc packages.
There's no real reason why LyX should limit itself just because latex
does. Here exactly is an example where latex will manage, if only LyX
would.
I don't think that latex would manage, but I'll create a test patch so
that we can try out.
The reason I say that latex will manage, is that the generated latex
file should look exactly the same as the file generated by 1.4.2, which
does work...
I don't understand. 1.4 has the same inputenc limitation, or is that not true?
Yes, 1.4 does have the same inputenc limitation. That's why we use
"default" encoding instead of "auto". "Auto" uses inputenc, and
explicitly states the encoding of each paragraph. But "default" doesn't
state anything about encodings at all, and doesn't use inputenc; and
latex just manages, I guess because it uses the language in order to
determine the encoding, and the language *can* change in the middle of a
paragraph.
Dov