Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

Jonathan Kew Sun, 21 Feb 2021 15:27:49 -0800

On 21/02/2021 22:55, Ross Moore wrote:

Hi David,
On 22 Feb 2021, at 8:43 am, David Carlisle <d.p.carli...@gmail.com<mailto:d.p.carli...@gmail.com>> wrote:
    Surely the line-end characters are already known, and the bits&bytes
    have been read up to that point *before* tokenisation.
This is not a pdflatex inputenc style utf-8 error failing to map astream of tokens.
It is at the file reading stage and if you have the file encodingwrong you do not know reliably what are the ends of lines and youhaven't interpreted it as tex at all, so the comment character reallycan't have an effect here.
Ummm. Is that really how XeTeX does it?
How then does Jonathan’s
    \XeTeXdefaultencoding "iso-8859-1”
ever work ?
Just a rhetorical question; don’t bother answering.   :-)
This mapping is invisible to the tex macro layer just as you canchange the internal character code mapping in classic tex to take anebcdic stream, if you do that then read an ascii file you get rubbishwith no hope to recover.
    So I don't think such a switch should be automatic to avoid
    reporting encoding errors.

    I reported the issue at xstring here
    https://framagit.org/unbonpetit/xstring/-/issues/4
    <https://framagit.org/unbonpetit/xstring/-/issues/4>
I looked at what you said here, and some of it doesn’t seem to be inaccord with
my TeXLive installations.

viz.
/usr/local/texlive/2016/.../xstring.tex:\expandafter\ifx\csname@latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2016/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2016/.../xstring.tex:% - Le package ne n\'ecessiteplus LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2016/.../xstring.tex:%     Plain eTeX.
/usr/local/texlive/2017/.../xstring.tex:% conditions of the LaTeXProject Public License, either version 1.3/usr/local/texlive/2017/.../xstring.tex:% and version 1.3 or later ispart of all distributions of LaTeX/usr/local/texlive/2017/.../xstring.tex:\expandafter\ifx\csname@latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2017/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2017/.../xstring.tex:% - Le package ne n\'ecessiteplus LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2017/.../xstring.tex:%     Plain eTeX.
/usr/local/texlive/2018/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2018/.../xstring.tex:% Licence : Released underthe LaTeX Project Public License v1.3c %
/usr/local/texlive/2018/.../xstring.tex:%     Plain eTeX.
/usr/local/texlive/2019/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2019/.../xstring.tex:% Licence : Released underthe LaTeX Project Public License v1.3c %
/usr/local/texlive/2019/.../xstring.tex:     Plain eTeX.
prior to 2018, the accents in comments used ASCII, so UTF-8, but notintentionally so.
In 2017, the accents in comments became  latin-1 chars.
A 1st line was added: % !TeX encoding = ISO-8859-1
to indicate this.
Such directive comments are useless, except at the beginning of the maindocument source.
They are for Front-End software, not TeX processing, right?

They're for front-end software, but not only for the main documentsource; any file could have an encoding directive to tell the editor howto load/save it.


Jonathan, David,
so far as I can tell, it was *never* in UTF-8 with preformed accents.



I have a copy of xstring.tex here (in an old TeXlive tree) that is dated

  \def\xstringversion     {1.7c}
  \def\xstringdate        {2013/10/13}

where many of the accents (in comments) are encoded "TeX-style" withcontrol sequences, but there are also some that are literal accentedletters -- and they're in utf-8. If I load this file as Latin-1 in myeditor, those letters are garbled.

(They're even mixed with the TeX-style sequences within a single line,sometimes:


% 2) Ensuite, on d\'etokenize ce d\'eveloppement de faÃ§on n'avoir plus que

Notice what happened to "façon" there when read as Latin-1...)

It does sound like they later did a deliberate conversion to Latin-1(contrary to what I was guessing); this is unfortunate, in that it meansthe file will be mis-read by software that expects UTF-8, which is thede facto default encoding for text these days.

So I think switching to UTF-8 would be a better choice; if they don'twant to do that, adding a \XeTeXinputencoding line would be helpful.

JK

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

Reply via email to