Thanks for the info.
2012/6/28 Li Li
> in Chinese, there isn't word boundary between words. it writes like:
> Iamok. you should tokenize it to I am ok
> if you want to search *amo*, you should view I am ok as one token. In
> Chinese, fuzzy search is not very useful. even use Standard Analyzer,
>
最好搜索的Analyzer 和生成index的Analyzer 保持一致
On Thu, Jun 28, 2012 at 2:31 PM, Paco Avila wrote:
> Thank, using Whitespace Analyzer works, but I don't understand why
> StandardAnalyzer does not work if according with the ChineseAnalyzer
> deprecation I should use StandardAnalyzer:
>
> @deprecated Use {@li
in Chinese, there isn't word boundary between words. it writes like:
Iamok. you should tokenize it to I am ok
if you want to search *amo*, you should view I am ok as one token. In
Chinese, fuzzy search is not very useful. even use Standard Analyzer,
it's ok to use boolean query. because "Iamok" is
Thank, using Whitespace Analyzer works, but I don't understand why
StandardAnalyzer does not work if according with the ChineseAnalyzer
deprecation I should use StandardAnalyzer:
@deprecated Use {@link StandardAnalyzer} instead, which has the same
functionality.
Is very annoying.
2012/6/27 Li Li
standard analyzer will segment each character into a token, you should use
whitespace analyzer or your own analyzer that can tokenize it as one token
for wildcard search
在 2012-6-27 傍晚6:20,"Paco Avila" 写道:
> Hi there,
>
> I have to index chinese content and I don't get the expected results when
>
Hi there,
I have to index chinese content and I don't get the expected results when
searching. It seems that the WildcardQuery does not work properly with the
chinese characters. See attached sample code.
I store the string "专项信息管理.doc" using the StandardAnalyzer and after that
search for "专项信*"