Arthur Reutenauer wrote:
This is exactly the problem that TeX's hyphenation algorithm was developed for. It's exactly as you write: you give a list of rules describing where you can and you can't break words ("hyphenation patterns") and TeX does the job of finding the "nicest" authorized break for you. I'm responsible with Mojca Miklavec for maintaining the hyphenation patterns in TeX Live; if you can describe the rules more precisely we can add patterns for Lao, Thai and Khmer to the set of patterns we already have (and it's already quite big, coming from several dozens of contributors all over the world). Mojca added patterns for all the major languages of India last month but we have no languages from South-East Asia yet. I've always understood the word-breaking rules were very different from other languages but I suppose the same mechanism could be adapted; you only need to bring the linguistic knowledge!
I agree with your analysis (and thought much the same), but there is a complication : TeX breaks lines only at spaces unless it hyphenates a word (default behaviour); what I understand from Brian's original message (Brian : please correct me if I am wrong) is that Lao breaks between character pairs rather than at spaces, and that no hyphenation occurs. Which made it a fascinating challenge and well worthy of attention :-) ** Phil. -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex