On 21/02/2021 22:55, Ross Moore wrote:
Hi David,

On 22 Feb 2021, at 8:43 am, David Carlisle <d.p.carli...@gmail.com <mailto:d.p.carli...@gmail.com>> wrote:

    Surely the line-end characters are already known, and the bits&bytes
    have been read up to that point *before* tokenisation.


This is not a pdflatex inputenc style utf-8 error failing to map a stream of tokens.

It is at the file reading stage and if you have the file encoding wrong you do not know reliably what are the ends of lines and you haven't interpreted it as tex at all, so the comment character really can't have an effect here.

Ummm. Is that really how XeTeX does it?
How then does Jonathan’s
    \XeTeXdefaultencoding "iso-8859-1”
ever work ?
Just a rhetorical question; don’t bother answering.   :-)

This mapping is invisible to the tex macro layer just as you can change the internal character code mapping in classic tex to take an ebcdic stream, if you do that then read an ascii file you get rubbish with no hope to recover.



    So I don't think such a switch should be automatic to avoid
    reporting encoding errors.

    I reported the issue at xstring here
    https://framagit.org/unbonpetit/xstring/-/issues/4
    <https://framagit.org/unbonpetit/xstring/-/issues/4>


I looked at what you said here, and some of it doesn’t seem to be in accord with
my TeXLive installations.

viz.

/usr/local/texlive/2016/.../xstring.tex:\expandafter\ifx\csname @latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2016/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2016/.../xstring.tex:%   - Le package ne n\'ecessite plus LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2016/.../xstring.tex:%     Plain eTeX.
/usr/local/texlive/2017/.../xstring.tex:% conditions of the LaTeX Project Public License, either version 1.3 /usr/local/texlive/2017/.../xstring.tex:% and version 1.3 or later is part of all distributions of LaTeX /usr/local/texlive/2017/.../xstring.tex:\expandafter\ifx\csname @latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2017/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2017/.../xstring.tex:%   - Le package ne n\'ecessite plus LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2017/.../xstring.tex:%     Plain eTeX.
/usr/local/texlive/2018/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2018/.../xstring.tex:% Licence    : Released under the LaTeX Project Public License v1.3c %
/usr/local/texlive/2018/.../xstring.tex:%     Plain eTeX.
/usr/local/texlive/2019/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2019/.../xstring.tex:% Licence    : Released under the LaTeX Project Public License v1.3c %
/usr/local/texlive/2019/.../xstring.tex:     Plain eTeX.

prior to 2018, the accents in comments used ASCII, so UTF-8, but not intentionally so.

In 2017, the accents in comments became  latin-1 chars.
A 1st line was added: % !TeX encoding = ISO-8859-1
to indicate this.

Such directive comments are useless, except at the beginning of the main document source.
They are for Front-End software, not TeX processing, right?

They're for front-end software, but not only for the main document source; any file could have an encoding directive to tell the editor how to load/save it.


Jonathan, David,
so far as I can tell, it was *never* in UTF-8 with preformed accents.



I have a copy of xstring.tex here (in an old TeXlive tree) that is dated

  \def\xstringversion     {1.7c}
  \def\xstringdate        {2013/10/13}

where many of the accents (in comments) are encoded "TeX-style" with control sequences, but there are also some that are literal accented letters -- and they're in utf-8. If I load this file as Latin-1 in my editor, those letters are garbled.

(They're even mixed with the TeX-style sequences within a single line, sometimes:

% 2) Ensuite, on d\'etokenize ce d\'eveloppement de façon n'avoir plus que

Notice what happened to "façon" there when read as Latin-1...)

It does sound like they later did a deliberate conversion to Latin-1 (contrary to what I was guessing); this is unfortunate, in that it means the file will be mis-read by software that expects UTF-8, which is the de facto default encoding for text these days.

So I think switching to UTF-8 would be a better choice; if they don't want to do that, adding a \XeTeXinputencoding line would be helpful.

JK

Reply via email to