[XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)

2023-04-16 Thread Philip Taylor (RHBNC)

Given —

\catcode 9 = 9

\let ~ = \undefined

^^I~

\end

why does XeTeX report

This is XeTeX, Version 3.141592653-2.6-0.93 (TeX Live 2021/W32TeX) 
(preloaded format=xetex)

restricted \write18 enabled.

entering extended mode

(./untitled-10.tex

! Undefined control sequence.

l.3 ^^I~

?

This output makes it appear that the offending control sequence is (or might be) 
^^I~, whereas it is in fact simply ~.  Should not the ^^I have been ignored rather 
than reported, as per line 1 ?  Note line 3 does not really contain ^^I~ but rather 
~, but as tabs cannot be reliably included in e-mail I represent them here 
as ^^I.  The ^^I in the transcript is genuine.

--
Philip Taylor


This email, its contents and any attachments are intended solely for the 
addressee and may contain confidential information. In certain circumstances, 
it may also be subject to legal privilege. Any unauthorised use, disclosure, or 
copying is not permitted. If you have received this email in error, please 
notify us and immediately and permanently delete it. Any views or opinions 
expressed in personal emails are solely those of the author and do not 
necessarily represent those of Royal Holloway, University of London. It is your 
responsibility to ensure that this email and any attachments are virus free.


Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)

2023-04-16 Thread Jonathan Kew

On 16/04/2023 17:16, Philip Taylor (RHBNC) wrote:

Given —


\catcode 9 = 9

\let ~ = \undefined

^^I~

\end


why does XeTeX report

This is XeTeX, Version 3.141592653-2.6-0.93 (TeX Live 2021/W32TeX) 
(preloaded format=xetex)


restricted \write18 enabled.

entering extended mode

(./untitled-10.tex

! Undefined control sequence.

l.3 ^^I~

?

This output makes it appear that the offending control sequence is (or 
might be) ^^I~, whereas it is in fact simply ~.  Should not the ^^I have 
been ignored rather than reported, as per line 1 ?  Note line 3 does not 
really contain ^^I~ but rather ~, but as tabs cannot be reliably 
included in e-mail I represent them here as ^^I.  The ^^I in the 
transcript is genuine.


First note: I see the same result with plain TeX. So not a XeTeX issue.

Anyway, this is expected behavior. The ^^I isn't part of the offending 
control sequence; it's just the preceding context on the line, which is 
what normally appears in a TeX error message.


Perhaps this is clearer if you add some more surrounding text:

  \catcode 9 = 9
  \let ~ = \undefined
  abc^^I~def
  \end

results in

  (./x.tex
  ! Undefined control sequence.
  l.3 abc^^I~
 def
  ?

Ignored characters are not "removed from the input" (despite anything 
Eijkhout says); they're still present, just ignored.


JK



Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)

2023-04-16 Thread Philip Taylor (Hellenic Institute)

On 16/04/2023 17:37, Jonathan Kew wrote:

First note: I see the same result with plain TeX. So not a XeTeX issue.
Agreed, but as I use only XeTeX this seemed the right place to ask. I 
did test with PdfTeX (not having plain TeX in my TeXworks armoury) and 
PdfTeX did much the same but put a real  in the transcript as 
opposed to ^^I.
Anyway, this is expected behavior. The ^^I isn't part of the offending 
control sequence; it's just the preceding context on the line, which 
is what normally appears in a TeX error message.
OK, so I should have realised that in the absence of a leading "\", ^^I~ 
could not possibly be a control sequence and therefore the "~" had to be 
an active character.  That fact had passed me by ...
Ignored characters are not "removed from the input" (despite anything 
Eijkhout says); they're still present, just ignored.


Having consulted Eijkhout first, I did then search the TeXbook to see if 
I could find a definitive statement concerning the treatment of ignored 
characters but failed to do so — perhaps I should search the PDF version 
rather than the printed ...


--
/Philip Taylor/



Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)

2023-04-16 Thread Zdenek Wagner
ne 16. 4. 2023 v 18:44 odesílatel Philip Taylor (Hellenic Institute)
 napsal:
>
> On 16/04/2023 17:37, Jonathan Kew wrote:
>
> First note: I see the same result with plain TeX. So not a XeTeX issue.
>
> Agreed, but as I use only XeTeX this seemed the right place to ask.  I did 
> test with PdfTeX (not having plain TeX in my TeXworks armoury) and PdfTeX did 
> much the same but put a real  in the transcript as opposed to ^^I.
>
> Anyway, this is expected behavior. The ^^I isn't part of the offending 
> control sequence; it's just the preceding context on the line, which is what 
> normally appears in a TeX error message.
>
> OK, so I should have realised that in the absence of a leading "\", ^^I~ 
> could not possibly be a control sequence and therefore the "~" had to be an 
> active character.  That fact had passed me by ...
>
> Ignored characters are not "removed from the input" (despite anything 
> Eijkhout says); they're still present, just ignored.
>
> Having consulted Eijkhout first, I did then search the TeXbook to see if I 
> could find a definitive statement concerning the treatment of ignored 
> characters but failed to do so — perhaps I should search the PDF version 
> rather than the printed ...
>
Just what I remember, TeX algorithms are separated to the "mouth" and
the "stomach". The mouth reads the input and assigns categories, thus
it sees ^^I as  the character with code 0x09, assigns category 9 to it
(ignored character) and sends this token to the stomach. The stomach
accepts a "character token" consisting from the character 0x09 with
category 9. The category says that the stomach should ignore it. What
causes an error message is and active character ~ (category 13) with
undefined definition. The error message contains the line (as given by
the mouth) up to the token which caused the error. This is the reason
why the ignored character appears in the error message, the character
is ignored by the stomach, not by the mouth.
> --
> Philip Taylor

Zdeněk Wagner
https://www.zdenek-wagner.eu/



Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)

2023-04-16 Thread Philip Taylor (Hellenic Institute)

On 16/04/2023 19:30, Zdeněk Wagner wrote:


The mouth reads the input and assigns categories, thus
it sees ^^I as  the character with code 0x09, assigns category 9 to it
(ignored character) and sends this token to the stomach.


Wel, that's not what Knuth says at Exercise 7.3, Zdeněk  —

[Q] Some of the category codes 0 to 15 [...] disappear in TeX's 
mouth.  [...] Which categories can actually reach TeX's stomach ?


[A] 1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13

so I think that on this occasion, perhaps, your memory may not be quite 
as infallible as it normally is ...


--
/** Phil./