Chinese needs a special analyzer. In java Lucene there are 3 choices.
Two of them do some kind of bigram search. Basically it takes every
two chars and indexes them. So ABCD is indexed as AB BC CD. The same
analyzer would be used to prepare the search request.
From what I gather spaces are not the appropriate "word" boundary.
In JSword we use the module's lang to pick an appropriate analyzer.
When we added it we didn't worry about backward compatibility. We
considered it as a bug fix. No one complained about having to rebuild
indexes. We did get thanks, though.
In Him,
DM
On Feb 10, 2010, at 8:12 PM, Nic Carter <niccar...@mac.com> wrote:
Hi team.
I received a question the other day about searching in Chinese
Bibles. It appears that clucene does word-based search & so if you
search for a specific character in Chinese, it will only find it if
there is a space before and after the character. To me, this sounds
like the correct behaviour, but I'm not sure if it is? Should I be
suggesting to this guy that he should do a C* search, where C is the
chinese character? or C~ ? or what do other people do when
searching in Chinese texts?
Any help anyone can give would be greatly appreciated. :)
Thanks, ybic
nic... :)
----
Nic Carter
PocketSword Developer - an iPhone Bible Study app
http://crosswire.org/pocketsword
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page