suggestion

Doug McKenna Sun, 12 Jan 2020 19:42:23 -0800

Phil Taylor wrote: 

>| So because JSBox is required/designed to incorporate all of XeTeX's 
>| features, it must (by definition) implement/provide \Umathcode.


Just to be clear, JSBox can eventually incorporate all of XeTeX's features 
(primitives), but does not do so now. It doesn't even incorporate pdfTeX's 
features, but it is set up to. I'm merely adding XeTeX features as necessary to 
get the LaTeX macro library installed and then typeset a LaTeX document 
containing no Unicode at all. The problem is that somewhere in the LaTeX format 
initialization the ability to recognize a Unicode character (as opposed to a 
UTF-8 byte sequence) is equated with the assumption that it's being run under 
XeTeX, and that therefore at least some of XeTeX's features are there and can 
be relied upon at format initialization time. 

>| But could not JSbox perform (or simulate) the following : 

>| \let \Umathschar = \Umathchar % use British spelling as synonym 
>| \let \Umathchar = \undefined % inhibit "load-unicode-data.tex"'s special 
>treatment of engines that implement \Umathchar 
>| \input load-unicode-data % since it would seem that you cannot simply skip 
>this step 
>| \let \Umathchar = \Umathschar % restore canonical meaning of \Umathchar 

It could, but it's not my code that's issuing "\input load-unicode-data". The 
reading of "load-unicode-data.tex" is embedded within my version of LaTeX's own 
initialization code, and there's no guarantee that elsewhere in that code there 
isn't some dependence on \Umathchar that such a re-definition might interfere 
with. LaTeX's code has several tests that rely on whether |\Umathchar| is 
defined or not, and even in the latest versions, it is declared that \Umathchar 
existence is the official way to test. Indeed, the latest official comments, as 
David Carlisle brought to my attention in this thread, declare that \Umathchar 
existence testing is the current way to go in all sorts of places. 

Such negative "let's fool some other code to get something done" hacks are 
fragile because they render the other, affected TeX code impossible to 
understand when reading it. Far better and safer is an affirmative addition to 
the various checks already being made that facially means what it says: if 
Unicode character mapping data has been loaded, don't bother. 

Here is perhaps a slightly better hack: 

If it's acceptable as the very first executable line in latex.ltx (or other 
format source files) to test the catcode value of `{ to determine whether a 
format has already been loaded or not, then it should be acceptable within 
"load-unicode-data.tex" (or the like) to include a similar test to determine 
whether to proceed with the TeX parse of the Unicode data, or to bail because 
it's presumable that the tables are already initialized. For example, the first 
non-8-bit Unicode character is: 

0100;LATIN CAPITAL LETTER A WITH MACRON;Lu;0;L;0041 0304;;;;N;LATIN CAPITAL 
LETTER A MACRON;;;0101; 

It is safe, I think, to assume that this Unicode character will forever be 
classified as an uppercase letter (with a lowercase mapping value of U+0101). 

When the XeTeX engine begins running, before any TeX source code is 
interpreted, the engine initializes its internal |cat_code| array (all 
1,114,112 slots) with the value |other_char| (12). It then does the usual 
classic TeX initialization to declare ASCII letters as such, etc. Later, during 
the LaTeX format's reading of "load-unicode-data.tex", a simple test to 
determine whether to continue reading the file could be made based on whether 
the catcode value of U+0100 is 11 (letter) or 12 (other). If it's already known 
as a letter, then the catcode table is not in its initial default state, and a 
second initialization is unnecessary. If it's still an |other_char| (12), then 
things need initializing for letter characters and the rest of 
"load-unicode-data.tex" should be executed. 

>>| Furthermore, the purpose of executing "load-unicode-data.tex" is precisely 
>>to 
>>| populate the \Umathchar table, as well as other Unicode character tables. 
>>| So these tables have to exist prior to executing the file. 

>| Well, do they, in the case of JSBox? From what you wrote in your original 
>| query, I thought that that [1] was the very thing that you were trying to 
>avoid ... 
>| [1] "executing "load-unicode-data.tex" [in order] to populate the \Umathchar 
>table". 
>| So specifically, does the \Umathchar table have to exist, in JSBox, at the 
>point 
>| that "load-unicode-data.tex" is loaded ? 

I'm trying to avoid initializing these character mapping tables twice, 
especially when the second time (reading this file) rather inefficiently takes 
30 times longer than the first, and accomplishes nothing new. 

Thanks for thinking about my questions, I appreciate it. 

Doug McKenna

Re: [XeTeX] [EXT] A LaTeX Unicode initialization desire/question/suggestion

Reply via email to