Hey guys,
I'm trying to index a field in Chinese using the CJKTokenizer, and I'm finding
that my searches on the index are not working at all. The index is created
properly (looking with Luke), and when I search against it with Luke the data
comes back as I would expect. Also, when I use the analysis page of solr
admin, the result is what I would expect. On an actual search though, nothing
is found.
Here are the relevant snippets from my confs:
<fieldtype name="text_zh" class="solr.TextField">
<analyzer>
<tokenizer
class="org.apache.solr.analysis.ja.CJKTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldtype>
...
<field name="text" type="text_zh" indexed="true" stored="false"
multiValued="true"/>
So if I send in
美聯社
it correctly creates 2 tokens
美聯 聯社
And if I do a search in Luke and the solr analysis page for美聯, I get a hit.
But on the actual search, I don't.
Also, I've noticed that the parsed query on luke is:
text:"美聯 聯社"
and in solr it is:
text:"美聯 聯社 "
I noticed there is an extra space in the solr parsed query. I don't know if
that makes a difference.
I'm really at a loss. Does anyone know why I don’t get search hits back?
Thanks,
Dave Keene