On 06/05/2015 15:09, Jonathan Kew wrote: > On 6/5/15 14:14, Joseph Wright wrote: > >> Based on the current files, we have a block to set \XeTeXcharclass, >> which only applies to XeTeX. The logic followed in that code is that >> characters in the file LineBreak.txt which have class "ID" (ideographs) >> not only set the \XeTeXcharclass class to 1 but also set the \catcode of >> the code point to 11. That leads to a difference between the two Unicode >> engines. My current feeling is that the data file should split this >> process such that the category code change applies to both XeTeX and >> LuaTeX, with the XeTeX-specific code separate. Does this make sense and >> indeed does the current assignment make sense? >> > > ISTM that the most appropriate (default) \catcode for characters with > class ID is clearly letter (11), and would suggest that LuaTeX should > follow XeTeX in this.
Well for LaTeX at least the team get to make the call here and I think we will pull everything into line. > So yes, splitting out the XeTeX-specific code and having LuaTeX share > the catcode assignments makes sense. OK, if there are no objections I have a plan on this (I'll actually keep all of the data, I think, and alter the assignment code). > After all, if users can write control sequences such as > > \hello > \halló > \Здравствуйте > \ሰላም > \सलाम > > they should equally well be able to write > > \你好 > \こんにちわ > > and have each of these treated as single control sequences, too. This > will not work if category ID characters are given catcode 12. Entirely reasonable. > If you're making improvements to unicode-letters.def, I would suggest > also adding a section that assigns catcode 15 (invalid) to the code > values "D800 - "DFFF (i.e. the UTF-16 surrogates, which should never be > used in isolation as characters). Noted: easy enough to add. -- Joseph Wright -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex