Hi Jonathan,

On 01/05/2010, at 4:06 AM, Jonathan Kew wrote:

The problem is that at this point, the .aux file is read *with* your \XeTeXdefaultencoding declaration in force, so the individual utf-8 bytes that were written to it now get interpreted as cp1252 characters and mapped to their Unicode values, instead of the byte sequences being interpreted as utf-8. That's the source of the "junk" you're getting. Those utf-8-bytes-interpreted-as-cp1252 then get re-encoded to utf-8 sequences as the .toc is written, so in effect the original characters have been "doubly encoded".

This sounds like a pretty generic kind of problem, ...


In this particular case, at least, you can work around the problem by resetting the default encoding immediately before the end of the document, so that when LaTeX reads in the .aux file at the end of the run, it reads it correctly as utf-8. In other words, if you modify this example to become:

  \documentclass[10pt,a4paper]{book}
  \usepackage[frenchb]{babel}
  \usepackage{fontspec}
  \usepackage{xunicode}
  \usepackage{xltxtra}
  \begin{document}
  \frontmatter
  \tableofcontents
  \XeTeXinputencoding "cp1252"
  \XeTeXdefaultencoding "cp1252"
  \mainmatter\setcounter{secnumdepth}{2}
  \chapter{Général de Gaulle}
  Il était français.
  \XeTeXdefaultencoding "utf-8"
  \end{document}

then your table of contents should correctly show "Général".

  ... so that the best solution might be to include
 a command such as:

   \AtEndDocument{\XeTeXdefaultencoding "utf-8"}

into the  xltxtra  package, so that it becomes something
that is always done, and authors do not need to worry about it.

Note that the  \...@enddocumenthook  is expanded more or less
immediately after the  \end{document} has been encountered.
Certainly before the .aux  file is closed for writing,
and re-opened for reading.

viz.  (from  latex.ltx )

\def\enddocument{%
   \let\atenddocume...@firstofone
   \...@enddocumenthook
   \...@checkend{document}%
   \clearpage
   \begingroup
     \...@filesw
       \immediate\closeo...@mainaux
       \l...@setckpt\@gobbletwo
       \l...@newl@b...@testdef
       \...@tempswafalse
       \makeatletter \input\jobname.aux
     \fi




However, there may be other situations where auxiliary files are written and read at unpredictable times during the processing of the document, making it more difficult to control the encodings at the right moments.

True. That gives another advantage in having the solution
recorded in a standard place such as  xltxtra.sty ,
preferably with some comments about why it is useful.
Then it can be found, and the solution patched-in to
the coding where other kinds of auxiliary files are
being written and read back in.

In general, moving to an entirely utf-8 environment is a better and more robust way forward.

True again, for new documents.
It is still desirable to provide solutions that cope
with technicalities that occur in other situations.



HTH,

Jonathan


All the best,

        Ross

------------------------------------------------------------------------
Ross Moore                                       r...@maths.mq.edu.au
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------






--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to