Re: Hebrew encoding (cp1255)

Georg Baum Sat, 30 Dec 2006 14:37:00 -0800

On Saturday 30 December 2006 21:26, Dov Feldstern wrote:
> I'm sorry for not being clear. You're right --- using the file
> heb142-default.lyx, there is a problem with the display, but the
> generated latex is correct. But the reason (for both of those things) is
> that the lyx file was originally created with 1.4.2 --- so it's not a
> unicode file at all!


No, that is not the reason. As you probably know LyX converts old files with 
lyx2lyx to the current format. This conversion works perfectly for the file 
heb142-default.lyx. It does also work perfectly for files with fixed encoding 
such as cp1255. It does not work for files with multiple encodings 
(http://bugzilla.lyx.org/show_bug.cgi?id=3049) , but I sent a patch yesterday 
that fixes this bug. With this patch you should not have any file format 
related bugs anymore (at least I don't know any).

> (And I guess this also implies a problem with 
> conversion of files from 1.4.X to 1.5.0?),

No, see above.

> So on the one hand, 1.5 
> doesn't know how to display the characters anymore (I'm not sure exactly
> why, though);

But I know: Since we don't know how LaTeX interprets the "default" encoding 
(this depends on many things) we treat it internally (and in the lyx2lyx 
conversion) as latin1. Therefore lyx2lyx interprets files in this encoding as 
latin1 and converts that to unicode. Since in your case the encoding was 
actually interpreted as cp1255 the unicode characters in LyX are wrong. 
Hoever, the generated output is OK, since the wrong unicode charcters re 
converted to latin1. This is then interpreted as cp1255 by LaTeX, and you get 
the correct DVI output. I guess that LyX 1.4 displayed the characters 
correctly, because it used the language to determine the encoding. If that is 
true we can do the same in 1.5, but I am not sure yet.

> and on the other hand, it has no problem when generating 
> the latex, because it doesn't have to encode it --- it's already encoded.
>
> But this doesn't reflect the true situation in 1.5. There, an equivalent
> file (attached) has the Hebrew represented as unencoded unicode. So LyX
> now displays it correctly; however, when generating the latex file, it
> depends on the encoding option:
> *) If encoding is set to cp1255, everything is okay;
> *) If encoding is set to "auto", then generating the latex file gets
> stuck on the English paragraph in which a Hebrew word appears.  This is
> because LyX is trying to encode the entire paragraph using a single
> encoding. So a Hebrew paragraph is encoded as cp1255, and everything is
> ok. But an English paragraph with Hebrew in it is encoded as latin1 or
> whatever, but suddenly runs into some characters which are Hebrew, and
> that's where iconv complains. If you look in lyx's temp dir, you can see
> the beginning of the file, up until the problematic paragraph.

The reason is known: The per paragraph encoding.

> *) If the encoding is set to "default", then iconv complains already at
> the Hebrew paragraph, so something there is still wrong.

The reason is also known: As I explined above the "default" encoding in LyX 
1.5 currently means that you have to enter the latin1 character that has the 
same code point in latin1 as the character that you want to have in the 
output has in the encoding that will be used by LaTeX.


> >>>>But here's where the second problem arises, and this time it's LyX's
> >>>>problem, not latex's (though I'm less sure about this part): it seems
> >>>> to me like LyX itself --- not only latex --- is also determining the
> >>>> encoding based on the paragraph, rather than based on the individual
> >>>> characters' language.
> >>>
> >>>Yes. It is implemented like that because of the limitation of older
> >>>inputenc packages.
> >>
> >>There's no real reason why LyX should limit itself just because latex
> >>does. Here exactly is an example where latex will manage, if only LyX
> >>would.
> >
> > I don't think that latex would manage, but I'll create a test patch so
> > that we can try out.
>
> The reason I say that latex will manage, is that the generated latex
> file should look exactly the same as the file generated by 1.4.2, which
> does work...

I don't understand. 1.4 has the same inputenc limitation, or is that not true?


Georg

Re: Hebrew encoding (cp1255)

Reply via email to