On Thu, 6 Mar 2003, Adrian Korten wrote: > We came up against a small problem with our Thai test module. When > searching for a word whose characters are part of other words, there is > no way to delimit the word. This occurs because Thai has no word breaks. > Somehow, the rtf engine seems to break the Thai words reasonably > accurately on the display of text. However, that same logic does not > seem to be in the search module.
Like Troy mentioned, we can turn on the ICU Thai word-breaking for searches. This, the option to display with whitespace word-breaks, and transliteration with whitespace word-breaks were actually the reasons why I didn't drop the relatively large Thai dictionary from ICU > The only alternative that I could come up with is to place Unicode > characters in as word breaks. Unicode has various characters to indicate > word breaks (non-breaking spaces, hyphenable breaks) invisibly. These > would have to be placed in the actual text module as UTF8 characters. You should encode as Unicode recommends, which I assume means no divisions between words at all. Adding tags like Frank suggested wouldn't help anyway because the strip filters will strip them out before searching. --Chris _______________________________________________ sword-devel mailing list [EMAIL PROTECTED] http://www.crosswire.org/mailman/listinfo/sword-devel