On 21/02/2021 22:55, Ross Moore wrote:
Hi David,
On 22 Feb 2021, at 8:43 am, David Carlisle <d.p.carli...@gmail.com
<mailto:d.p.carli...@gmail.com>> wrote:
Surely the line-end characters are already known, and the bits&bytes
have been read up to that point *before* tokenisation.
This is not a pdflatex inputenc style utf-8 error failing to map a
stream of tokens.
It is at the file reading stage and if you have the file encoding
wrong you do not know reliably what are the ends of lines and you
haven't interpreted it as tex at all, so the comment character really
can't have an effect here.
Ummm. Is that really how XeTeX does it?
How then does Jonathan’s
\XeTeXdefaultencoding "iso-8859-1”
ever work ?
Just a rhetorical question; don’t bother answering. :-)
This mapping is invisible to the tex macro layer just as you can
change the internal character code mapping in classic tex to take an
ebcdic stream, if you do that then read an ascii file you get rubbish
with no hope to recover.
So I don't think such a switch should be automatic to avoid
reporting encoding errors.
I reported the issue at xstring here
https://framagit.org/unbonpetit/xstring/-/issues/4
<https://framagit.org/unbonpetit/xstring/-/issues/4>
I looked at what you said here, and some of it doesn’t seem to be in
accord with
my TeXLive installations.
viz.
/usr/local/texlive/2016/.../xstring.tex:\expandafter\ifx\csname
@latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2016/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2016/.../xstring.tex:% - Le package ne n\'ecessite
plus LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2016/.../xstring.tex:% Plain eTeX.
/usr/local/texlive/2017/.../xstring.tex:% conditions of the LaTeX
Project Public License, either version 1.3
/usr/local/texlive/2017/.../xstring.tex:% and version 1.3 or later is
part of all distributions of LaTeX
/usr/local/texlive/2017/.../xstring.tex:\expandafter\ifx\csname
@latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2017/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2017/.../xstring.tex:% - Le package ne n\'ecessite
plus LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2017/.../xstring.tex:% Plain eTeX.
/usr/local/texlive/2018/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2018/.../xstring.tex:% Licence : Released under
the LaTeX Project Public License v1.3c %
/usr/local/texlive/2018/.../xstring.tex:% Plain eTeX.
/usr/local/texlive/2019/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2019/.../xstring.tex:% Licence : Released under
the LaTeX Project Public License v1.3c %
/usr/local/texlive/2019/.../xstring.tex: Plain eTeX.
prior to 2018, the accents in comments used ASCII, so UTF-8, but not
intentionally so.
In 2017, the accents in comments became latin-1 chars.
A 1st line was added: % !TeX encoding = ISO-8859-1
to indicate this.
Such directive comments are useless, except at the beginning of the main
document source.
They are for Front-End software, not TeX processing, right?
They're for front-end software, but not only for the main document
source; any file could have an encoding directive to tell the editor how
to load/save it.
Jonathan, David,
so far as I can tell, it was *never* in UTF-8 with preformed accents.
I have a copy of xstring.tex here (in an old TeXlive tree) that is dated
\def\xstringversion {1.7c}
\def\xstringdate {2013/10/13}
where many of the accents (in comments) are encoded "TeX-style" with
control sequences, but there are also some that are literal accented
letters -- and they're in utf-8. If I load this file as Latin-1 in my
editor, those letters are garbled.
(They're even mixed with the TeX-style sequences within a single line,
sometimes:
% 2) Ensuite, on d\'etokenize ce d\'eveloppement de façon n'avoir plus que
Notice what happened to "façon" there when read as Latin-1...)
It does sound like they later did a deliberate conversion to Latin-1
(contrary to what I was guessing); this is unfortunate, in that it means
the file will be mis-read by software that expects UTF-8, which is the
de facto default encoding for text these days.
So I think switching to UTF-8 would be a better choice; if they don't
want to do that, adding a \XeTeXinputencoding line would be helpful.
JK