in Chinese, there isn't word boundary between words. it writes like: Iamok. you should tokenize it to I am ok if you want to search *amo*, you should view I am ok as one token. In Chinese, fuzzy search is not very useful. even use Standard Analyzer, it's ok to use boolean query. because "Iamok" is tokenized as I a m o k. if search boolean query +a +m +o, it's fine. Chinese has many letters(commonly used more than 3000). and words are very short(most words has only 2 letters).
On Thu, Jun 28, 2012 at 2:31 PM, Paco Avila <monk...@gmail.com> wrote: > Thank, using Whitespace Analyzer works, but I don't understand why > StandardAnalyzer does not work if according with the ChineseAnalyzer > deprecation I should use StandardAnalyzer: > > @deprecated Use {@link StandardAnalyzer} instead, which has the same > functionality. > > Is very annoying. > > 2012/6/27 Li Li <fancye...@gmail.com> > >> standard analyzer will segment each character into a token, you should use >> whitespace analyzer or your own analyzer that can tokenize it as one token >> for wildcard search >> 在 2012-6-27 傍晚6:20,"Paco Avila" <monk...@gmail.com>写道: >> >> > Hi there, >> > >> > I have to index chinese content and I don't get the expected results when >> > searching. It seems that the WildcardQuery does not work properly with >> the >> > chinese characters. See attached sample code. >> > >> > I store the string "专项信息管理.doc" using the StandardAnalyzer and after that >> > search for "专项信*" and no result is given. AFAIK, it should match the >> > "专项信息管理.doc" string but it doesn't :( >> > >> > NOTE: Use Lucene 3.1.0 >> > >> > Regards. >> > -- >> > http://www.openkm.com >> > http://www.guia-ubuntu.org >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > > > > -- > OpenKM > http://www.openkm.com > http://www.guia-ubuntu.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org