On Fri, 17 Feb 2012 14:10:21 +0000 Caolán McNamara <caol...@redhat.com> wrote:
> On Thu, 2012-02-16 at 23:24 +0000, Richard Wordingham wrote: > Indeed, yeah, I suppose, assuming its as complicated as "Thai", that > the right direction would be for someone to write for icu new > dictionary-based breakiterators for the "nod"(?) language and then the > rather trivial changes to LibreOffice to know about the language in > order to mark text as that language to bubble that info down to icu Northern Thai's not quite as simple or standardised as Siamese! One can meet (at least) the following spelling systems: 1) Chiangmai phonetics 2) Chiangrai phonetics (different mapping of tones to Siamese spelling rules) 3) Transliteration from Tai Tham script (probably rare for connected text) 4) Tai Tham script However, perhaps dictionary-based break iterators are something to be treated like dictionaries. There are several other writing systems that could probably benefit from them: Thai script: Northern Thai NE Thai (for recording songs - use of Siamese tone rules scrambles the tonemarks compared to Siamese cognates) Khmer script: Khmer - there's already a project for this set up on SourceForge. Pali Tai Tham script: Tai Khuen Tai Lue Pali Lao script Lao Tibetan script Tibetan I've a feeling Burmese may also have a need for dictionary based text breaking, though it's better behaved for syllable breaking than most of the others listed here. Shan would come in the same category. The above list is not exhaustive. Tai Lue in Lao script probably belongs in the list. Not all Thai script writing systems need a break iterator - some of the minority languages separate words with spaces, but that's partially a matter of literacy - Thais start writing Thai with interword gaps and then learn to suppress the gaps. Pali written in Thai also separates words with spaces - but Pali has some very long words! Richard. _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice