I'd be willing to give it a try and can find some people to help test it. Does that mean someone would compile a special test version? This is a question from TBS but they are not in a rush for this at the moment. IOW, when it is convenient for you then do it and meanwhile I can tell them that it is being worked on.
Adrian
p.s. I'm not in a hurry for it either because they have first asked me to setup a Linux file-server and e-mail server - both new things for me.
Chris Little wrote:
On Thu, 6 Mar 2003, Adrian Korten wrote:
We came up against a small problem with our Thai test module. When searching for a word whose characters are part of other words, there is no way to delimit the word. This occurs because Thai has no word breaks. Somehow, the rtf engine seems to break the Thai words reasonably accurately on the display of text. However, that same logic does not seem to be in the search module.
Like Troy mentioned, we can turn on the ICU Thai word-breaking for
searches. This, the option to display with whitespace word-breaks, and transliteration with whitespace word-breaks were actually the reasons why I didn't drop the relatively large Thai dictionary from ICU
The only alternative that I could come up with is to place Unicode characters in as word breaks. Unicode has various characters to indicate word breaks (non-breaking spaces, hyphenable breaks) invisibly. These would have to be placed in the actual text module as UTF8 characters.
You should encode as Unicode recommends, which I assume means no divisions between words at all. Adding tags like Frank suggested wouldn't help anyway because the strip filters will strip them out before searching.
--Chris
_______________________________________________ sword-devel mailing list [EMAIL PROTECTED] http://www.crosswire.org/mailman/listinfo/sword-devel
_______________________________________________ sword-devel mailing list [EMAIL PROTECTED] http://www.crosswire.org/mailman/listinfo/sword-devel