question about bi-gram analysis on query

Keene, David Tue, 02 Oct 2007 17:27:23 -0700

Hey guys,

I'm trying to index a field in Chinese using the CJKTokenizer, and I'm finding 
that my searches on the index are not working at all.  The index is created 
properly (looking with Luke), and when I search against it with Luke the data 
comes back as I would expect.  Also, when I use the analysis page of solr 
admin, the result is what I would expect.  On an actual search though, nothing 
is found.


Here are the relevant snippets from my confs:

<fieldtype name="text_zh" class="solr.TextField">
  <analyzer>
    <tokenizer
      class="org.apache.solr.analysis.ja.CJKTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.TrimFilterFactory" />
  </analyzer>
</fieldtype>

...

<field name="text" type="text_zh" indexed="true" stored="false" 
multiValued="true"/>


So if I send in
美聯社 
it correctly creates 2 tokens
美聯  聯社  

And if I do a search in Luke and the solr analysis page for美聯, I get a hit.  
But on the actual search, I don't.

Also, I've noticed that the parsed query on luke is:
text:"美聯 聯社"
and in solr it is:
text:"美聯 聯社 "
I noticed there is an extra space in the solr parsed query.  I don't know if 
that makes a difference.

I'm really at a loss.  Does anyone know why I don’t get search hits back?

Thanks,
Dave Keene

question about bi-gram analysis on query

Reply via email to