Re: about tex2lyx and Unicode

Andre Poenitz Mon, 13 Oct 2008 12:51:19 -0700

On Mon, Oct 13, 2008 at 09:31:34PM +0200, Georg Baum wrote:
> Uwe Stöhr wrote:
> 
> > "magic" does not help in general as we will still stay sticked where we
> > are. We need a general solution to be able to create a lyxformat newer
> > than 248. I attached a LyX file and its TeX output. This one compiles fine
> > with latex and pdflatex. The exercise we have is to import this TeX file
> > to get the same as in the LyX file. (I already implemented tex2lyx support
> > for the language handling, like \selectlanguage. What is missing is the
> > handling of the encoding.)
> 
> I don't see the problem with the patch. IIRC an utf8 encoding (if specified
> explicitly) is also valid for format 248. Of course the patch does not
> solve the general unicode problem in tex2lyx. I am absolutely no fan of
> magic either, but AFAICS in the xetex case there is no other way than to
> apply magic, either like this or by trying to detect the used encoding.
> Both methods are not 100% reliable, but work in most cases. So, the patch
> improves the situation, but of course much more work is needed. In the long
> run an autodetection of the encoding is probably better, because not all
> xetex documents have this special comment. If the encoding can be overriden
> from the command line (for the rare cases where the autodetection does not
> work) every document can be converted. For the moment the
> conclusion "special comment _and_ no inputenc command => utf8" is even 100%
> safe: Both ascii and latin1 are a subset of utf8, and tex2lyx cannot handle
> any other encoding anway (it outputs hardcoded latin1 in several cases (bug
> 4299)). Therefore even a hardcoded utf8 would be an improvement!


Acually I think we should switch .lyx encoding to utf8 for good. Of
course, the parameter to \inputencoding can and should be kept and if
possible used in the output, but otherwise there is no real reason to
keep an uncertain encoding in the .lyx format.

> The natural way would be to convert tex2lyx to docstring, use an ifdocstream
> to read the file (beginning with an ascii encoding), and switch the
> encoding whenever a command like \inputencoding (or a magic comment, or, if
> at the beginning, a non-ascii character) is read. This would work very
> similar to the LaTeX export mechanism, and should not be too difficult to
> implement. When I implemented the codecvt facets I made sure that they work
> both for output and input (don't know if that is still the case).

Or slurp in the contents of the .tex file and try various encodings
until we find one that "does the trick", possibly after cutting it
into parts for which we know that the encoding stays constant.

> I believe that using the existing codecvt facets and idocstream is easier
> than calling iconv directly, because it does not interfere with the
> structure of the output.
> 
> BTW, does the fact that you are now working on this mean that the
> tex2lyx-python ghost is finally dead?

It nevert looked alive enough to justify the energy needed to bring 
the flame thrower into action...

Andre'

Re: about tex2lyx and Unicode

Reply via email to