[XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Andrew Goldstone
Hello: I am attempting to assist a colleague, who is new to TeX, in
typesetting a text which includes many passages in which Burmese and Latin
scripts are closely intermixed. I wanted to make it possible for my
colleague to enter his text fairly naturally, as he is used to doing in
Word, by simply mixing the scripts, rather having to type a macro to switch
languages/fonts at nearly every word. On tex.stackexchange I found a
suggestion to use XeTeX's interchar mechanism for this purpose and adapted
the code example to my own purposes.

Though this works fine on its own, it leads to problems, and sometimes
crashes, in conjunction with two other desirable XeTeX features, namely its
linebreak-locale and interword space-shaping mechanisms. The example below
my signature demonstrates the following three-way interaction:

(A) XeTeXlinebreaklocale="my"
(B) XeTeXinterwordspaceshaping=2
(C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)

A   some ligatures render incorrectly, e.g. lla လ္ +လ
B   ok, but must use explicit \selectlanguage{burmese}
C   ok, but Burmese lines only broken on spaces (unidiomatic)
A+B ok, but must use explicit \selectlanguage{burmese}
A+C ligature renders incorrectly
B+C segfault if more than one switch to Burmese
A+B+C   segfault if more than one switch to Burmese

My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95
(TeX Live 2023).

I can certainly help my colleague work around the crashing bug by
postprocessing his source with a script to insert \selectlanguage{} next to
the appropriate Unicode range, but the crash is frustrating. I believe this
is the same issue as was raised on StackExchange in 2019

https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script

but I couldn't find any further discussion of a fix for the crash.

Many thanks for any help: perhaps I've come at this all wrong. My own XeTeX
experience has almost all been in the Latin alphabet. Best,
Andrew Goldstone

PS my example script--forgive the verbosity. The two Burmese words are just
taken at random from my colleague's sample text, with the first repeated to
fill out a line.

\documentclass[draft,12pt]{article}
\usepackage[english]{babel}
\babelprovide[import]{burmese}
\babelfont[burmese]{rm}{Noto Serif Myanmar Regular}

\XeTeXlinebreaklocale "my" % (A)
\XeTeXinterwordspaceshaping=2  % (B)

% (C)...

\newXeTeXintercharclass\burmesesub
\newcount\myCount
\myCount="1000
\loop\ifnum\myCount<"109F
  \XeTeXcharclass\myCount=\burmesesub
  \advance\myCount by 1
\repeat

\XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
\XeTeXinterchartoks 4095 \burmesesub = {\begingroup\selectlanguage{burmese}}
\XeTeXinterchartoks \burmesesub 0 = {\endgroup}
\XeTeXinterchartoks \burmesesub 4095 = {\endgroup}

\XeTeXinterchartokenstate=1

% ...(C)

\begin{document}


ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla

သည် ၊ saññ·|

\end{document}


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Shree Devi Kumar
You can try https://github.com/Pomax/ucharclasses

I have used it in past with Devanagari, Tamil, Gujarati scripts and English.

On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone 
wrote:

> Hello: I am attempting to assist a colleague, who is new to TeX, in
> typesetting a text which includes many passages in which Burmese and Latin
> scripts are closely intermixed. I wanted to make it possible for my
> colleague to enter his text fairly naturally, as he is used to doing in
> Word, by simply mixing the scripts, rather having to type a macro to switch
> languages/fonts at nearly every word. On tex.stackexchange I found a
> suggestion to use XeTeX's interchar mechanism for this purpose and adapted
> the code example to my own purposes.
>
> Though this works fine on its own, it leads to problems, and sometimes
> crashes, in conjunction with two other desirable XeTeX features, namely its
> linebreak-locale and interword space-shaping mechanisms. The example below
> my signature demonstrates the following three-way interaction:
>
> (A) XeTeXlinebreaklocale="my"
> (B) XeTeXinterwordspaceshaping=2
> (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)
>
> A   some ligatures render incorrectly, e.g. lla လ္ +လ
> B   ok, but must use explicit \selectlanguage{burmese}
> C   ok, but Burmese lines only broken on spaces (unidiomatic)
> A+B ok, but must use explicit \selectlanguage{burmese}
> A+C ligature renders incorrectly
> B+C segfault if more than one switch to Burmese
> A+B+C   segfault if more than one switch to Burmese
>
> My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95
> (TeX Live 2023).
>
> I can certainly help my colleague work around the crashing bug by
> postprocessing his source with a script to insert \selectlanguage{} next to
> the appropriate Unicode range, but the crash is frustrating. I believe this
> is the same issue as was raised on StackExchange in 2019
>
>
> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
>
> but I couldn't find any further discussion of a fix for the crash.
>
> Many thanks for any help: perhaps I've come at this all wrong. My own
> XeTeX experience has almost all been in the Latin alphabet. Best,
> Andrew Goldstone
>
> PS my example script--forgive the verbosity. The two Burmese words are
> just taken at random from my colleague's sample text, with the first
> repeated to fill out a line.
>
> \documentclass[draft,12pt]{article}
> \usepackage[english]{babel}
> \babelprovide[import]{burmese}
> \babelfont[burmese]{rm}{Noto Serif Myanmar Regular}
>
> \XeTeXlinebreaklocale "my" % (A)
> \XeTeXinterwordspaceshaping=2  % (B)
>
> % (C)...
>
> \newXeTeXintercharclass\burmesesub
> \newcount\myCount
> \myCount="1000
> \loop\ifnum\myCount<"109F
>   \XeTeXcharclass\myCount=\burmesesub
>   \advance\myCount by 1
> \repeat
>
> \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
> \XeTeXinterchartoks 4095 \burmesesub =
> {\begingroup\selectlanguage{burmese}}
> \XeTeXinterchartoks \burmesesub 0 = {\endgroup}
> \XeTeXinterchartoks \burmesesub 4095 = {\endgroup}
>
> \XeTeXinterchartokenstate=1
>
> % ...(C)
>
> \begin{document}
>
>
> ထက်လုလ္လ
> thak·lulla
> ထက်လုလ္လ
> thak·lulla
> ထက်လုလ္လ
> thak·lulla
> ထက်လုလ္လ
> thak·lulla
>
> သည် ၊ saññ·|
>
> \end{document}
>
>
>
>
>


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Andrew Goldstone
Thank you for the hint about ucharclasses! That saves my writing the
\XeTeXinterchartoks lines myself and does (rather mysteriously?) seem to
avoid the segfault in conjunction with \XeTeXinterwordspaceshaping=2. The
\XeTeXlinebreaklocale "my" still looks wrong--it breaks a ligature (i.e. a
conjunct consonant) apart at a line break--but this is much closer to what
my colleague needs. Thanks again. Hoping someone may be able to add more
about the Burmese-specific aspect of all this. All best,
Andrew

On Wed, Sep 6, 2023 at 12:33 PM Shree Devi Kumar 
wrote:

> You can try https://github.com/Pomax/ucharclasses
>
> I have used it in past with Devanagari, Tamil, Gujarati scripts and
> English.
>
> On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone 
> wrote:
>
>> Hello: I am attempting to assist a colleague, who is new to TeX, in
>> typesetting a text which includes many passages in which Burmese and Latin
>> scripts are closely intermixed. I wanted to make it possible for my
>> colleague to enter his text fairly naturally, as he is used to doing in
>> Word, by simply mixing the scripts, rather having to type a macro to switch
>> languages/fonts at nearly every word. On tex.stackexchange I found a
>> suggestion to use XeTeX's interchar mechanism for this purpose and adapted
>> the code example to my own purposes.
>>
>> Though this works fine on its own, it leads to problems, and sometimes
>> crashes, in conjunction with two other desirable XeTeX features, namely its
>> linebreak-locale and interword space-shaping mechanisms. The example below
>> my signature demonstrates the following three-way interaction:
>>
>> (A) XeTeXlinebreaklocale="my"
>> (B) XeTeXinterwordspaceshaping=2
>> (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)
>>
>> A   some ligatures render incorrectly, e.g. lla လ္ +လ
>> B   ok, but must use explicit \selectlanguage{burmese}
>> C   ok, but Burmese lines only broken on spaces (unidiomatic)
>> A+B ok, but must use explicit \selectlanguage{burmese}
>> A+C ligature renders incorrectly
>> B+C segfault if more than one switch to Burmese
>> A+B+C   segfault if more than one switch to Burmese
>>
>> My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95
>> (TeX Live 2023).
>>
>> I can certainly help my colleague work around the crashing bug by
>> postprocessing his source with a script to insert \selectlanguage{} next to
>> the appropriate Unicode range, but the crash is frustrating. I believe this
>> is the same issue as was raised on StackExchange in 2019
>>
>>
>> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
>>
>> but I couldn't find any further discussion of a fix for the crash.
>>
>> Many thanks for any help: perhaps I've come at this all wrong. My own
>> XeTeX experience has almost all been in the Latin alphabet. Best,
>> Andrew Goldstone
>>
>> PS my example script--forgive the verbosity. The two Burmese words are
>> just taken at random from my colleague's sample text, with the first
>> repeated to fill out a line.
>>
>> \documentclass[draft,12pt]{article}
>> \usepackage[english]{babel}
>> \babelprovide[import]{burmese}
>> \babelfont[burmese]{rm}{Noto Serif Myanmar Regular}
>>
>> \XeTeXlinebreaklocale "my" % (A)
>> \XeTeXinterwordspaceshaping=2  % (B)
>>
>> % (C)...
>>
>> \newXeTeXintercharclass\burmesesub
>> \newcount\myCount
>> \myCount="1000
>> \loop\ifnum\myCount<"109F
>>   \XeTeXcharclass\myCount=\burmesesub
>>   \advance\myCount by 1
>> \repeat
>>
>> \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
>> \XeTeXinterchartoks 4095 \burmesesub =
>> {\begingroup\selectlanguage{burmese}}
>> \XeTeXinterchartoks \burmesesub 0 = {\endgroup}
>> \XeTeXinterchartoks \burmesesub 4095 = {\endgroup}
>>
>> \XeTeXinterchartokenstate=1
>>
>> % ...(C)
>>
>> \begin{document}
>>
>>
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>>
>> သည် ၊ saññ·|
>>
>> \end{document}
>>
>>
>>
>>
>>


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Werner LEMBERG


> You can try https://github.com/Pomax/ucharclasses

No need to use the version from github.  TeXLive is up to date.


Werner