Thank you for the hint about ucharclasses! That saves my writing the \XeTeXinterchartoks lines myself and does (rather mysteriously?) seem to avoid the segfault in conjunction with \XeTeXinterwordspaceshaping=2. The \XeTeXlinebreaklocale "my" still looks wrong--it breaks a ligature (i.e. a conjunct consonant) apart at a line break--but this is much closer to what my colleague needs. Thanks again. Hoping someone may be able to add more about the Burmese-specific aspect of all this. All best, Andrew
On Wed, Sep 6, 2023 at 12:33 PM Shree Devi Kumar <shreesh...@gmail.com> wrote: > You can try https://github.com/Pomax/ucharclasses > > I have used it in past with Devanagari, Tamil, Gujarati scripts and > English. > > On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone <andrew.goldst...@gmail.com> > wrote: > >> Hello: I am attempting to assist a colleague, who is new to TeX, in >> typesetting a text which includes many passages in which Burmese and Latin >> scripts are closely intermixed. I wanted to make it possible for my >> colleague to enter his text fairly naturally, as he is used to doing in >> Word, by simply mixing the scripts, rather having to type a macro to switch >> languages/fonts at nearly every word. On tex.stackexchange I found a >> suggestion to use XeTeX's interchar mechanism for this purpose and adapted >> the code example to my own purposes. >> >> Though this works fine on its own, it leads to problems, and sometimes >> crashes, in conjunction with two other desirable XeTeX features, namely its >> linebreak-locale and interword space-shaping mechanisms. The example below >> my signature demonstrates the following three-way interaction: >> >> (A) XeTeXlinebreaklocale="my" >> (B) XeTeXinterwordspaceshaping=2 >> (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions) >> >> A some ligatures render incorrectly, e.g. lla လ္ +လ >> B ok, but must use explicit \selectlanguage{burmese} >> C ok, but Burmese lines only broken on spaces (unidiomatic) >> A+B ok, but must use explicit \selectlanguage{burmese} >> A+C ligature renders incorrectly >> B+C segfault if more than one switch to Burmese >> A+B+C segfault if more than one switch to Burmese >> >> My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.999995 >> (TeX Live 2023). >> >> I can certainly help my colleague work around the crashing bug by >> postprocessing his source with a script to insert \selectlanguage{} next to >> the appropriate Unicode range, but the crash is frustrating. I believe this >> is the same issue as was raised on StackExchange in 2019 >> >> >> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script >> >> but I couldn't find any further discussion of a fix for the crash. >> >> Many thanks for any help: perhaps I've come at this all wrong. My own >> XeTeX experience has almost all been in the Latin alphabet. Best, >> Andrew Goldstone >> >> PS my example script--forgive the verbosity. The two Burmese words are >> just taken at random from my colleague's sample text, with the first >> repeated to fill out a line. >> >> \documentclass[draft,12pt]{article} >> \usepackage[english]{babel} >> \babelprovide[import]{burmese} >> \babelfont[burmese]{rm}{Noto Serif Myanmar Regular} >> >> \XeTeXlinebreaklocale "my" % (A) >> \XeTeXinterwordspaceshaping=2 % (B) >> >> % (C)... >> >> \newXeTeXintercharclass\burmesesub >> \newcount\myCount >> \myCount="1000 >> \loop\ifnum\myCount<"109F >> \XeTeXcharclass\myCount=\burmesesub >> \advance\myCount by 1 >> \repeat >> >> \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}} >> \XeTeXinterchartoks 4095 \burmesesub = >> {\begingroup\selectlanguage{burmese}} >> \XeTeXinterchartoks \burmesesub 0 = {\endgroup} >> \XeTeXinterchartoks \burmesesub 4095 = {\endgroup} >> >> \XeTeXinterchartokenstate=1 >> >> % ...(C) >> >> \begin{document} >> >> >> ထက်လုလ္လ >> thak·lulla >> ထက်လုလ္လ >> thak·lulla >> ထက်လုလ္လ >> thak·lulla >> ထက်လုလ္လ >> thak·lulla >> >> သည် ၊ saññ·| >> >> \end{document} >> >> >> >> >>