Guenter Milde wrote:

> On 2013-03-31, Georg Baum wrote:
>> Guenter Milde wrote:
>>> On 2013-03-27, Georg Baum wrote:
> 
> ...
> 
>>> Without file inclusions, the "LaTeX encoding" of the exported file does
>>> not matter for the Postscript/PDF-generation:
> 
>>> * The encoding of the LyX-document itself is always utf8 (since several
>>>   versions of LyX).
>>> * (re) import into LyX converts the from the "LaTeX encoding" into utf8.
>>> * With 8-bit LaTeX, every non-ASCII character is converted to LICRs
>>>   (either by LyX (if the encoding is set to ASCII, or by the inputenc
>>>   package).
>>> * With (Xe/LuaLaTeX), the LaTeX encoding is always utf8.
> 
>> In theory you are right. In practice this is not always the case for
>> 8-bit LaTeX. Either you choose utf8 encoding. In that case you need to
>> load any of the existing utf8 support packages like utf8 or utf8x, but
>> none of them is complete. Also, listings do not work with utf8. Or you
>> choose any other encoding, but in that case you rely on
>> lib/unicodesymbols, which is incomplete as well, and may even load
>> packages that are incompatible to each other.
> 
> I don't know about CJK and other asian languages, but for Latin, Greek,
> and Cyrillic, lib/unicodesymbols is more complete than any LaTeX inputenc
> file. The problem of incompatible packages needs to be resolved, but again
> this would not go away using one of the existing "LaTeX encodings" (let
> alone a mix of them).

Yes (assuming the incompatible packages are only caused by additional 
symbols that are not in any inputenc file).

> The "force" flag in lib/unicodesymbols provides a workaround for the
> incomplete translation in inputen's utf8.
> 
> utf8x is non-standard and unsupported and should only be used by users
> that know the dangers and incompatibilities.

Sure, but IIRC it supports symbols that are not supported by standard utf8 
with inputenc. If my memory is wrong we should seriously think of removing 
utf8x support from LyX.

> I agree with "listings". Does it work with 8-bit encodings?

Yes.

>>> For included files it is IMO quite sensible to assume the locale
>>> encoding as a first guess. If the "LaTeX encoding" and the locale
>>> encoding are the same, chances are best that no re-encoding is required.
> 
>> Why is it sensible to choose the locale encoding? This assumes that the
>> document language matches the locale, but this is an invalid assumption
>> IMHO, as I tried to explain. I know that many text editors assume that as
>> default, but that does not make it better.
> 
> For LaTeX documents, there is no requirement that the encoding matches a
> language default. All characters can be represented in the LaTeX internal
> character representation (LICR), a pure ASCII encoding using a
> combination of accent- and character-macros. Both, inputenc's *.enc files
> and the translations in lib/unicodesymbols transform into LICRs, so the
> 8-bit encoding default specified in "lib/languages" is merely for
> convenience (and from a pre-utf8 time).
> 
> The encoding default of the OS can be assumed to be the encoding of the
> majority of files on the system. Hence, this choice would minimize
> problems with included files.

As I tried to explain, this is a dangerous assumption, and I don't agree 
with it, but I fear we can't come to a common understanding here.

>>> This is why, the current default (language-dependent multi-encoding)
>>> is an outdated and very bad choice. It was justified to a certain degree
>>> when LyX still used 8-bit encodings for the *.lyx file itself but this
>>> is now several years ago.
> 
>> I do not agree. The main purpose of the exported LaTeX is not to be
>> edited with a text editor, but to be typeset. For the latter purpose the
>> default is IMHO still the best one, at least as long as utf8 support is
>> as limited as nowadays in 8-bit LaTeX.
> 
> There are several purposes of exported LaTeX. Of course, the "internally"
> exported file will be typset directly. Explicit export (File>Export>...)
> is done for either storing in a more generic format, post-processing,
> sharing with non-LyX co-workers, etc. In all these use cases, a mixed
> encoding is rather an annoyance than a help.

Please don't assume too much about the users. If I had to choose between a 
mixed encoding and a file cluttered with macros for umlauts, I'd chose the 
former.

> If not utf8, we should at least use a consistent encoding, either the main
> document language's default 8-bit encoding or ASCII.

I agree that one single encoding (except pure ASCII, this becomes 
unreadable) would be better than a mixed encoding for anything but internal 
typesetting. For internal typesetting it would not matter as long as no 
symbols are missing. If we want to think further about this, there are IMHO 
three major cases to consider:

1) Try to use utf8 exclusively. This would work at least as well as the 
mixed case if the inputenc utf8 encoding would support all symbols of all 
other inputenc encodings, and if the utf8 support of CJK.sty is as complete 
as the support for other CJK encodings. I believe that CJK would be OK, but 
I don't know if the utf8 support of inputenc is complete enough nowadays. 
Finally, there is the known problem with listings.sty, and there are 
probably more packages that do not work with variable width encodings like 
utf8. This option would still require some minor code changes in LyX (e.g. 
to wrap CJK languages in CJK UTF8 environments).

2) Try to use any other encoding than utf8 exclusively. This does only work 
if lib/unicodesymbols contains definitions for all symbols provided by most 
encodings supported by inputenc.sty and CJK.sty. This is probably the case 
for western european encodings, but it is definitely not the case for 
symbols needed by CJK languages.

3) Like 2), but make an exception for CJK languages and wrap them in CJK 
environments like now. This case has the highest chance to work IMO.

Then I don't know what would happen to languages that use special hardcoded 
support (like thai), these might pose a problem in all cases.

Please don't get me wrong, I'd rather sooner than later get rid of the 
multiple encoding stuff. However, I believe that this is currently not 
possible, and that the OS encoding should not be considered at all, because 
that could change the output of the typeset document depending on the used 
OS encoding.


Georg

Reply via email to