Hi Jonathan,
On 01/05/2010, at 4:06 AM, Jonathan Kew wrote:
The problem is that at this point, the .aux file is read *with*
your \XeTeXdefaultencoding declaration in force, so the individual
utf-8 bytes that were written to it now get interpreted as cp1252
characters and mapped to their Unicode values, instead of the byte
sequences being interpreted as utf-8. That's the source of the
"junk" you're getting. Those utf-8-bytes-interpreted-as-cp1252 then
get re-encoded to utf-8 sequences as the .toc is written, so in
effect the original characters have been "doubly encoded".
This sounds like a pretty generic kind of problem, ...
In this particular case, at least, you can work around the problem
by resetting the default encoding immediately before the end of the
document, so that when LaTeX reads in the .aux file at the end of
the run, it reads it correctly as utf-8. In other words, if you
modify this example to become:
\documentclass[10pt,a4paper]{book}
\usepackage[frenchb]{babel}
\usepackage{fontspec}
\usepackage{xunicode}
\usepackage{xltxtra}
\begin{document}
\frontmatter
\tableofcontents
\XeTeXinputencoding "cp1252"
\XeTeXdefaultencoding "cp1252"
\mainmatter\setcounter{secnumdepth}{2}
\chapter{Général de Gaulle}
Il était français.
\XeTeXdefaultencoding "utf-8"
\end{document}
then your table of contents should correctly show "Général".
... so that the best solution might be to include
a command such as:
\AtEndDocument{\XeTeXdefaultencoding "utf-8"}
into the xltxtra package, so that it becomes something
that is always done, and authors do not need to worry about it.
Note that the \...@enddocumenthook is expanded more or less
immediately after the \end{document} has been encountered.
Certainly before the .aux file is closed for writing,
and re-opened for reading.
viz. (from latex.ltx )
\def\enddocument{%
\let\atenddocume...@firstofone
\...@enddocumenthook
\...@checkend{document}%
\clearpage
\begingroup
\...@filesw
\immediate\closeo...@mainaux
\l...@setckpt\@gobbletwo
\l...@newl@b...@testdef
\...@tempswafalse
\makeatletter \input\jobname.aux
\fi
However, there may be other situations where auxiliary files are
written and read at unpredictable times during the processing of
the document, making it more difficult to control the encodings at
the right moments.
True. That gives another advantage in having the solution
recorded in a standard place such as xltxtra.sty ,
preferably with some comments about why it is useful.
Then it can be found, and the solution patched-in to
the coding where other kinds of auxiliary files are
being written and read back in.
In general, moving to an entirely utf-8 environment is a better and
more robust way forward.
True again, for new documents.
It is still desirable to provide solutions that cope
with technicalities that occur in other situations.
HTH,
Jonathan
All the best,
Ross
------------------------------------------------------------------------
Ross Moore r...@maths.mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex