Re: [sword-devel] Search in Chinese modules

DM Smith Wed, 10 Feb 2010 17:49:08 -0800

Chinese needs a special analyzer. In java Lucene there are 3 choices.Two of them do some kind of bigram search. Basically it takes everytwo chars and indexes them. So ABCD is indexed as AB BC CD. The sameanalyzer would be used to prepare the search request.


From what I gather spaces are not the appropriate "word" boundary.

In JSword we use the module's lang to pick an appropriate analyzer.When we added it we didn't worry about backward compatibility. Weconsidered it as a bug fix. No one complained about having to rebuildindexes. We did get thanks, though.


In Him,
   DM


On Feb 10, 2010, at 8:12 PM, Nic Carter <niccar...@mac.com> wrote:

Hi team.
I received a question the other day about searching in ChineseBibles. It appears that clucene does word-based search & so if yousearch for a specific character in Chinese, it will only find it ifthere is a space before and after the character. To me, this soundslike the correct behaviour, but I'm not sure if it is? Should I besuggesting to this guy that he should do a C* search, where C is thechinese character? or C~ ? or what do other people do whensearching in Chinese texts?
Any help anyone can give would be greatly appreciated.  :)

Thanks, ybic
   nic...  :)

----
Nic Carter
PocketSword Developer - an iPhone Bible Study app
http://crosswire.org/pocketsword


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Search in Chinese modules

Reply via email to