[XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms
Hello: I am attempting to assist a colleague, who is new to TeX, in typesetting a text which includes many passages in which Burmese and Latin scripts are closely intermixed. I wanted to make it possible for my colleague to enter his text fairly naturally, as he is used to doing in Word, by simply mixing the scripts, rather having to type a macro to switch languages/fonts at nearly every word. On tex.stackexchange I found a suggestion to use XeTeX's interchar mechanism for this purpose and adapted the code example to my own purposes. Though this works fine on its own, it leads to problems, and sometimes crashes, in conjunction with two other desirable XeTeX features, namely its linebreak-locale and interword space-shaping mechanisms. The example below my signature demonstrates the following three-way interaction: (A) XeTeXlinebreaklocale="my" (B) XeTeXinterwordspaceshaping=2 (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions) A some ligatures render incorrectly, e.g. lla လ္ +လ B ok, but must use explicit \selectlanguage{burmese} C ok, but Burmese lines only broken on spaces (unidiomatic) A+B ok, but must use explicit \selectlanguage{burmese} A+C ligature renders incorrectly B+C segfault if more than one switch to Burmese A+B+C segfault if more than one switch to Burmese My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95 (TeX Live 2023). I can certainly help my colleague work around the crashing bug by postprocessing his source with a script to insert \selectlanguage{} next to the appropriate Unicode range, but the crash is frustrating. I believe this is the same issue as was raised on StackExchange in 2019 https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script but I couldn't find any further discussion of a fix for the crash. Many thanks for any help: perhaps I've come at this all wrong. My own XeTeX experience has almost all been in the Latin alphabet. Best, Andrew Goldstone PS my example script--forgive the verbosity. The two Burmese words are just taken at random from my colleague's sample text, with the first repeated to fill out a line. \documentclass[draft,12pt]{article} \usepackage[english]{babel} \babelprovide[import]{burmese} \babelfont[burmese]{rm}{Noto Serif Myanmar Regular} \XeTeXlinebreaklocale "my" % (A) \XeTeXinterwordspaceshaping=2 % (B) % (C)... \newXeTeXintercharclass\burmesesub \newcount\myCount \myCount="1000 \loop\ifnum\myCount<"109F \XeTeXcharclass\myCount=\burmesesub \advance\myCount by 1 \repeat \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}} \XeTeXinterchartoks 4095 \burmesesub = {\begingroup\selectlanguage{burmese}} \XeTeXinterchartoks \burmesesub 0 = {\endgroup} \XeTeXinterchartoks \burmesesub 4095 = {\endgroup} \XeTeXinterchartokenstate=1 % ...(C) \begin{document} ထက်လုလ္လ thak·lulla ထက်လုလ္လ thak·lulla ထက်လုလ္လ thak·lulla ထက်လုလ္လ thak·lulla သည် ၊ saññ·| \end{document}
Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms
You can try https://github.com/Pomax/ucharclasses I have used it in past with Devanagari, Tamil, Gujarati scripts and English. On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone wrote: > Hello: I am attempting to assist a colleague, who is new to TeX, in > typesetting a text which includes many passages in which Burmese and Latin > scripts are closely intermixed. I wanted to make it possible for my > colleague to enter his text fairly naturally, as he is used to doing in > Word, by simply mixing the scripts, rather having to type a macro to switch > languages/fonts at nearly every word. On tex.stackexchange I found a > suggestion to use XeTeX's interchar mechanism for this purpose and adapted > the code example to my own purposes. > > Though this works fine on its own, it leads to problems, and sometimes > crashes, in conjunction with two other desirable XeTeX features, namely its > linebreak-locale and interword space-shaping mechanisms. The example below > my signature demonstrates the following three-way interaction: > > (A) XeTeXlinebreaklocale="my" > (B) XeTeXinterwordspaceshaping=2 > (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions) > > A some ligatures render incorrectly, e.g. lla လ္ +လ > B ok, but must use explicit \selectlanguage{burmese} > C ok, but Burmese lines only broken on spaces (unidiomatic) > A+B ok, but must use explicit \selectlanguage{burmese} > A+C ligature renders incorrectly > B+C segfault if more than one switch to Burmese > A+B+C segfault if more than one switch to Burmese > > My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95 > (TeX Live 2023). > > I can certainly help my colleague work around the crashing bug by > postprocessing his source with a script to insert \selectlanguage{} next to > the appropriate Unicode range, but the crash is frustrating. I believe this > is the same issue as was raised on StackExchange in 2019 > > > https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script > > but I couldn't find any further discussion of a fix for the crash. > > Many thanks for any help: perhaps I've come at this all wrong. My own > XeTeX experience has almost all been in the Latin alphabet. Best, > Andrew Goldstone > > PS my example script--forgive the verbosity. The two Burmese words are > just taken at random from my colleague's sample text, with the first > repeated to fill out a line. > > \documentclass[draft,12pt]{article} > \usepackage[english]{babel} > \babelprovide[import]{burmese} > \babelfont[burmese]{rm}{Noto Serif Myanmar Regular} > > \XeTeXlinebreaklocale "my" % (A) > \XeTeXinterwordspaceshaping=2 % (B) > > % (C)... > > \newXeTeXintercharclass\burmesesub > \newcount\myCount > \myCount="1000 > \loop\ifnum\myCount<"109F > \XeTeXcharclass\myCount=\burmesesub > \advance\myCount by 1 > \repeat > > \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}} > \XeTeXinterchartoks 4095 \burmesesub = > {\begingroup\selectlanguage{burmese}} > \XeTeXinterchartoks \burmesesub 0 = {\endgroup} > \XeTeXinterchartoks \burmesesub 4095 = {\endgroup} > > \XeTeXinterchartokenstate=1 > > % ...(C) > > \begin{document} > > > ထက်လုလ္လ > thak·lulla > ထက်လုလ္လ > thak·lulla > ထက်လုလ္လ > thak·lulla > ထက်လုလ္လ > thak·lulla > > သည် ၊ saññ·| > > \end{document} > > > > >
Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms
Thank you for the hint about ucharclasses! That saves my writing the \XeTeXinterchartoks lines myself and does (rather mysteriously?) seem to avoid the segfault in conjunction with \XeTeXinterwordspaceshaping=2. The \XeTeXlinebreaklocale "my" still looks wrong--it breaks a ligature (i.e. a conjunct consonant) apart at a line break--but this is much closer to what my colleague needs. Thanks again. Hoping someone may be able to add more about the Burmese-specific aspect of all this. All best, Andrew On Wed, Sep 6, 2023 at 12:33 PM Shree Devi Kumar wrote: > You can try https://github.com/Pomax/ucharclasses > > I have used it in past with Devanagari, Tamil, Gujarati scripts and > English. > > On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone > wrote: > >> Hello: I am attempting to assist a colleague, who is new to TeX, in >> typesetting a text which includes many passages in which Burmese and Latin >> scripts are closely intermixed. I wanted to make it possible for my >> colleague to enter his text fairly naturally, as he is used to doing in >> Word, by simply mixing the scripts, rather having to type a macro to switch >> languages/fonts at nearly every word. On tex.stackexchange I found a >> suggestion to use XeTeX's interchar mechanism for this purpose and adapted >> the code example to my own purposes. >> >> Though this works fine on its own, it leads to problems, and sometimes >> crashes, in conjunction with two other desirable XeTeX features, namely its >> linebreak-locale and interword space-shaping mechanisms. The example below >> my signature demonstrates the following three-way interaction: >> >> (A) XeTeXlinebreaklocale="my" >> (B) XeTeXinterwordspaceshaping=2 >> (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions) >> >> A some ligatures render incorrectly, e.g. lla လ္ +လ >> B ok, but must use explicit \selectlanguage{burmese} >> C ok, but Burmese lines only broken on spaces (unidiomatic) >> A+B ok, but must use explicit \selectlanguage{burmese} >> A+C ligature renders incorrectly >> B+C segfault if more than one switch to Burmese >> A+B+C segfault if more than one switch to Burmese >> >> My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95 >> (TeX Live 2023). >> >> I can certainly help my colleague work around the crashing bug by >> postprocessing his source with a script to insert \selectlanguage{} next to >> the appropriate Unicode range, but the crash is frustrating. I believe this >> is the same issue as was raised on StackExchange in 2019 >> >> >> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script >> >> but I couldn't find any further discussion of a fix for the crash. >> >> Many thanks for any help: perhaps I've come at this all wrong. My own >> XeTeX experience has almost all been in the Latin alphabet. Best, >> Andrew Goldstone >> >> PS my example script--forgive the verbosity. The two Burmese words are >> just taken at random from my colleague's sample text, with the first >> repeated to fill out a line. >> >> \documentclass[draft,12pt]{article} >> \usepackage[english]{babel} >> \babelprovide[import]{burmese} >> \babelfont[burmese]{rm}{Noto Serif Myanmar Regular} >> >> \XeTeXlinebreaklocale "my" % (A) >> \XeTeXinterwordspaceshaping=2 % (B) >> >> % (C)... >> >> \newXeTeXintercharclass\burmesesub >> \newcount\myCount >> \myCount="1000 >> \loop\ifnum\myCount<"109F >> \XeTeXcharclass\myCount=\burmesesub >> \advance\myCount by 1 >> \repeat >> >> \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}} >> \XeTeXinterchartoks 4095 \burmesesub = >> {\begingroup\selectlanguage{burmese}} >> \XeTeXinterchartoks \burmesesub 0 = {\endgroup} >> \XeTeXinterchartoks \burmesesub 4095 = {\endgroup} >> >> \XeTeXinterchartokenstate=1 >> >> % ...(C) >> >> \begin{document} >> >> >> ထက်လုလ္လ >> thak·lulla >> ထက်လုလ္လ >> thak·lulla >> ထက်လုလ္လ >> thak·lulla >> ထက်လုလ္လ >> thak·lulla >> >> သည် ၊ saññ·| >> >> \end{document} >> >> >> >> >>
Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms
> You can try https://github.com/Pomax/ucharclasses No need to use the version from github. TeXLive is up to date. Werner