[XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)
Given — \catcode 9 = 9 \let ~ = \undefined ^^I~ \end why does XeTeX report This is XeTeX, Version 3.141592653-2.6-0.93 (TeX Live 2021/W32TeX) (preloaded format=xetex) restricted \write18 enabled. entering extended mode (./untitled-10.tex ! Undefined control sequence. l.3 ^^I~ ? This output makes it appear that the offending control sequence is (or might be) ^^I~, whereas it is in fact simply ~. Should not the ^^I have been ignored rather than reported, as per line 1 ? Note line 3 does not really contain ^^I~ but rather ~, but as tabs cannot be reliably included in e-mail I represent them here as ^^I. The ^^I in the transcript is genuine. -- Philip Taylor This email, its contents and any attachments are intended solely for the addressee and may contain confidential information. In certain circumstances, it may also be subject to legal privilege. Any unauthorised use, disclosure, or copying is not permitted. If you have received this email in error, please notify us and immediately and permanently delete it. Any views or opinions expressed in personal emails are solely those of the author and do not necessarily represent those of Royal Holloway, University of London. It is your responsibility to ensure that this email and any attachments are virus free.
Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)
On 16/04/2023 17:16, Philip Taylor (RHBNC) wrote: Given — \catcode 9 = 9 \let ~ = \undefined ^^I~ \end why does XeTeX report This is XeTeX, Version 3.141592653-2.6-0.93 (TeX Live 2021/W32TeX) (preloaded format=xetex) restricted \write18 enabled. entering extended mode (./untitled-10.tex ! Undefined control sequence. l.3 ^^I~ ? This output makes it appear that the offending control sequence is (or might be) ^^I~, whereas it is in fact simply ~. Should not the ^^I have been ignored rather than reported, as per line 1 ? Note line 3 does not really contain ^^I~ but rather ~, but as tabs cannot be reliably included in e-mail I represent them here as ^^I. The ^^I in the transcript is genuine. First note: I see the same result with plain TeX. So not a XeTeX issue. Anyway, this is expected behavior. The ^^I isn't part of the offending control sequence; it's just the preceding context on the line, which is what normally appears in a TeX error message. Perhaps this is clearer if you add some more surrounding text: \catcode 9 = 9 \let ~ = \undefined abc^^I~def \end results in (./x.tex ! Undefined control sequence. l.3 abc^^I~ def ? Ignored characters are not "removed from the input" (despite anything Eijkhout says); they're still present, just ignored. JK
Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)
On 16/04/2023 17:37, Jonathan Kew wrote: First note: I see the same result with plain TeX. So not a XeTeX issue. Agreed, but as I use only XeTeX this seemed the right place to ask. I did test with PdfTeX (not having plain TeX in my TeXworks armoury) and PdfTeX did much the same but put a real in the transcript as opposed to ^^I. Anyway, this is expected behavior. The ^^I isn't part of the offending control sequence; it's just the preceding context on the line, which is what normally appears in a TeX error message. OK, so I should have realised that in the absence of a leading "\", ^^I~ could not possibly be a control sequence and therefore the "~" had to be an active character. That fact had passed me by ... Ignored characters are not "removed from the input" (despite anything Eijkhout says); they're still present, just ignored. Having consulted Eijkhout first, I did then search the TeXbook to see if I could find a definitive statement concerning the treatment of ignored characters but failed to do so — perhaps I should search the PDF version rather than the printed ... -- /Philip Taylor/
Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)
ne 16. 4. 2023 v 18:44 odesílatel Philip Taylor (Hellenic Institute) napsal: > > On 16/04/2023 17:37, Jonathan Kew wrote: > > First note: I see the same result with plain TeX. So not a XeTeX issue. > > Agreed, but as I use only XeTeX this seemed the right place to ask. I did > test with PdfTeX (not having plain TeX in my TeXworks armoury) and PdfTeX did > much the same but put a real in the transcript as opposed to ^^I. > > Anyway, this is expected behavior. The ^^I isn't part of the offending > control sequence; it's just the preceding context on the line, which is what > normally appears in a TeX error message. > > OK, so I should have realised that in the absence of a leading "\", ^^I~ > could not possibly be a control sequence and therefore the "~" had to be an > active character. That fact had passed me by ... > > Ignored characters are not "removed from the input" (despite anything > Eijkhout says); they're still present, just ignored. > > Having consulted Eijkhout first, I did then search the TeXbook to see if I > could find a definitive statement concerning the treatment of ignored > characters but failed to do so — perhaps I should search the PDF version > rather than the printed ... > Just what I remember, TeX algorithms are separated to the "mouth" and the "stomach". The mouth reads the input and assigns categories, thus it sees ^^I as the character with code 0x09, assigns category 9 to it (ignored character) and sends this token to the stomach. The stomach accepts a "character token" consisting from the character 0x09 with category 9. The category says that the stomach should ignore it. What causes an error message is and active character ~ (category 13) with undefined definition. The error message contains the line (as given by the mouth) up to the token which caused the error. This is the reason why the ignored character appears in the error message, the character is ignored by the stomach, not by the mouth. > -- > Philip Taylor Zdeněk Wagner https://www.zdenek-wagner.eu/
Re: [XeTeX] Confused (why are ignored characters not "removed from the input" as per Eijkhout's TeX by Topic ?)
On 16/04/2023 19:30, Zdeněk Wagner wrote: The mouth reads the input and assigns categories, thus it sees ^^I as the character with code 0x09, assigns category 9 to it (ignored character) and sends this token to the stomach. Wel, that's not what Knuth says at Exercise 7.3, Zdeněk — [Q] Some of the category codes 0 to 15 [...] disappear in TeX's mouth. [...] Which categories can actually reach TeX's stomach ? [A] 1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13 so I think that on this occasion, perhaps, your memory may not be quite as infallible as it normally is ... -- /** Phil./