Uwe Stöhr wrote: > "magic" does not help in general as we will still stay sticked where we > are. We need a general solution to be able to create a lyxformat newer > than 248. I attached a LyX file and its TeX output. This one compiles fine > with latex and pdflatex. The exercise we have is to import this TeX file > to get the same as in the LyX file. (I already implemented tex2lyx support > for the language handling, like \selectlanguage. What is missing is the > handling of the encoding.)
I don't see the problem with the patch. IIRC an utf8 encoding (if specified explicitly) is also valid for format 248. Of course the patch does not solve the general unicode problem in tex2lyx. I am absolutely no fan of magic either, but AFAICS in the xetex case there is no other way than to apply magic, either like this or by trying to detect the used encoding. Both methods are not 100% reliable, but work in most cases. So, the patch improves the situation, but of course much more work is needed. In the long run an autodetection of the encoding is probably better, because not all xetex documents have this special comment. If the encoding can be overriden from the command line (for the rare cases where the autodetection does not work) every document can be converted. For the moment the conclusion "special comment _and_ no inputenc command => utf8" is even 100% safe: Both ascii and latin1 are a subset of utf8, and tex2lyx cannot handle any other encoding anway (it outputs hardcoded latin1 in several cases (bug 4299)). Therefore even a hardcoded utf8 would be an improvement! > I prefer that we agree to a basic concept how to fix this. I proposed one > that will work in all cases, but I don't know if iconv can handle that. The natural way would be to convert tex2lyx to docstring, use an ifdocstream to read the file (beginning with an ascii encoding), and switch the encoding whenever a command like \inputencoding (or a magic comment, or, if at the beginning, a non-ascii character) is read. This would work very similar to the LaTeX export mechanism, and should not be too difficult to implement. When I implemented the codecvt facets I made sure that they work both for output and input (don't know if that is still the case). > What I wanted to point out in my last comments is that a TeX file is in > general a multi-encoding document. The encoding of the different document > parts are given by the options of inputenc and \inputencoding, see so the > attached TeX file. Exactly. And you can have even more fun with the different methods to switch encodings for CJK languages ;-) > So my proposal is to read the encodings from the TeX file and convert the > document parts via iconv to uft8 and build then the LyX file. I believe that using the existing codecvt facets and idocstream is easier than calling iconv directly, because it does not interfere with the structure of the output. BTW, does the fact that you are now working on this mean that the tex2lyx-python ghost is finally dead? Georg