On 9/12/2013 6:17 PM, Khaled Hosny wrote:
Some writing systems do not use spaces to separate words, so TeX’s
normal line breaking algorithm will fail. \XeTeXlinebreaklocale
instructs XeTeX to break the lines based on the rule of those writing
systems.

‹Locale ID› should be the ISO code of the language in question,

Hmm, wouldn't this be insufficient information? Some languages are written in multiple scripts, and I would not be surprised if word breaks are signaled differently in those different scripts. Japanese, for example?

documentation is a bit vague, but it seems to calculate the line
breaking position based on the Unicode character properties and the
locale value is simply ignored).

That also seems insufficient, since multiple languages may use the same script and have different word (and therefore line) breaking characteristics. Although perhaps closer, given that scripts that don't use spaces are *perhaps* more unique to a particular language, or to a small set of similar languages--e.g. Chinese script, to the extent that Cantonese and Mandarin are similar in their word break characteristics. But here I'm *really* ignorant.

In general, word breaking in scripts that don't indicate word boundaries is a partly unsolved research problem in computational linguistics--and from what I've heard, native speakers often disagree. (If you think that's odd, you might consider 'doghouse' vs. 'dog house' in English...) So I suppose it's not surprising if this doesn't work as well in XeTeX as one might hope.
--
   Mike Maxwell
   "The biggest danger is not ignorance,
   but the illusion of knowledge."
   --Stephen Hawking


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to