On 21 Apr 2010, at 15:05, François Charette wrote: > BTW, I just checked the latest sources of ICU4C: there is indeed no such > implementation for Lao yet (nor for Khmer or Myanmar afaics). I am however > puzzled by the fact that the ICU source tarball does not appear to provide a > Thai dictionary for word-breaking purposes, even though the engine implies > the availability of such a dictionary (I expected a file like "thaidict.brk" > somewhere, which is mentioned in source/tools/genrb/genrb.c). Or did I miss > something?
It's probably in some processed/compiled form; I believe the tarball doesn't include all the original sources needed to rebuild the ICU data files, but rather some kind of prebuilt files. To get the original sources you need to check out the code from their source repository. Indeed, there's a thaidict.txt in http://icu-project.org/trac/browser/icu/trunk/source/data/brkitr. (The name may have changed from .brk to .txt and the comment in genrb.c is out of date, I guess.) JK -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex