When the LaTeX format is built, there are tests for whether or not a 
Unicode-aware TeX engine is doing the work.  I presume that XeTeX is such a 
Unicode-aware engine, though I'm not familiar with what the definition of 
"Unicode-aware TeX engine" actually is (separate issue).

During the input of various hyphenation pattern files (a group for each 
language code), the first such file that uses non-ASCII Unicode code points is 
for Ancient Greek, in the file

/usr/local/texlive/2017/texmf-dist/tex/generic/hyph-utf8/patterns/texhyph-grc.tex

at line 61, which starts out

α1 ε1 η1 ι1 ο1 υ1 ω1 ϊ1 ...

TeX's code and specification says that only lowercase letters can appear in 
pattern words, and the definition within TeX's source code of a lowercase 
letter is any entry in the \lccode table that, when indexed by a character, 
delivers itself.

But as near as I can tell, during the building of the LaTeX format (i.e., 
running "latex.ltx") there is no TeX source code that installs any of these 
Greek letters into the \lccode table.  Therefore, I'm concluding that the XeTeX 
engine does this itself when it initializes, rather than awaiting any TeX 
source code to do it.

But there are a whole lot of lowercase letters in Unicode, so I'm wondering how 
XeTeX determines legal lowercase letters for initial pattern files?

I've tried looking at some version of the xetex.web code, but without 
illumination, I'm afraid.

TIA,

Doug McKenna



Reply via email to