Chinese needs a special analyzer. In java Lucene there are 3 choices. Two of them do some kind of bigram search. Basically it takes every two chars and indexes them. So ABCD is indexed as AB BC CD. The same analyzer would be used to prepare the search request.

From what I gather spaces are not the appropriate "word" boundary.

In JSword we use the module's lang to pick an appropriate analyzer. When we added it we didn't worry about backward compatibility. We considered it as a bug fix. No one complained about having to rebuild indexes. We did get thanks, though.

In Him,
   DM


On Feb 10, 2010, at 8:12 PM, Nic Carter <niccar...@mac.com> wrote:


Hi team.

I received a question the other day about searching in Chinese Bibles. It appears that clucene does word-based search & so if you search for a specific character in Chinese, it will only find it if there is a space before and after the character. To me, this sounds like the correct behaviour, but I'm not sure if it is? Should I be suggesting to this guy that he should do a C* search, where C is the chinese character? or C~ ? or what do other people do when searching in Chinese texts?

Any help anyone can give would be greatly appreciated.  :)

Thanks, ybic
   nic...  :)

----
Nic Carter
PocketSword Developer - an iPhone Bible Study app
http://crosswire.org/pocketsword


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to