On Thu, Sep 12, 2013 at 07:20:30PM -0400, Mike Maxwell wrote: > In general, word breaking in scripts that don't indicate word > boundaries is a partly unsolved research problem in computational > linguistics--and from what I've heard, native speakers often > disagree. (If you think that's odd, you might consider 'doghouse' > vs. 'dog house' in English...) So I suppose it's not surprising if > this doesn't work as well in XeTeX as one might hope.
As I said, this is all handled by ICU (or Graphite, for Graphite fonts). The documentation was not that clear last time I looked into it, but it is not something I fully understand anyway: http://userguide.icu-project.org/boundaryanalysis http://www.icu-project.org/apiref/icu4c/classicu_1_1BreakIterator.html Regards, Khaled -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex