Re: [XeTeX] [tex-hyphen] Hyphenation of polytonic Greek (expressed in Unicode)

Mike Maxwell Thu, 12 Sep 2013 20:18:05 -0700

On 9/12/2013 6:17 PM, Khaled Hosny wrote:

Some writing systems do not use spaces to separate words, so TeX’s
normal line breaking algorithm will fail. \XeTeXlinebreaklocale
instructs XeTeX to break the lines based on the rule of those writing
systems.


‹Locale ID› should be the ISO code of the language in question,

Hmm, wouldn't this be insufficient information? Some languages arewritten in multiple scripts, and I would not be surprised if word breaksare signaled differently in those different scripts. Japanese, for example?

documentation is a bit vague, but it seems to calculate the line
breaking position based on the Unicode character properties and the
locale value is simply ignored).

That also seems insufficient, since multiple languages may use the samescript and have different word (and therefore line) breakingcharacteristics. Although perhaps closer, given that scripts that don'tuse spaces are *perhaps* more unique to a particular language, or to asmall set of similar languages--e.g. Chinese script, to the extent thatCantonese and Mandarin are similar in their word break characteristics.But here I'm *really* ignorant.

In general, word breaking in scripts that don't indicate word boundariesis a partly unsolved research problem in computational linguistics--andfrom what I've heard, native speakers often disagree. (If you thinkthat's odd, you might consider 'doghouse' vs. 'dog house' in English...)So I suppose it's not surprising if this doesn't work as well in XeTeXas one might hope.

--
   Mike Maxwell
   "The biggest danger is not ignorance,
   but the illusion of knowledge."
   --Stephen Hawking


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] [tex-hyphen] Hyphenation of polytonic Greek (expressed in Unicode)

Reply via email to