Re: Range query on TextField

WU, Zhiqing Wed, 12 Jan 2022 09:48:22 -0800

Hi Andy,

Loads of thanks for your reply. I am trying to figure out my problem by
following your advice.


I have installed Solr (8.5) on my computer and added 4 documents into
a core.

In the 4 documents, staffName_txt field has been set to "Lindmar Deborah",
"Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.



At the beginning, without changing anything in managed-schema, I did two
range queries:

q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
Deborah", "Mr Kenyon John" and " Saab Jerry"

q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"



After that, I find the fieldType of "text_general" in managed-schema:

  <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">

    <analyzer type="index">

      <tokenizer class="solr.StandardTokenizerFactory"/>

      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>

      <filter class="solr.LowerCaseFilterFactory"/>

    </analyzer>

    <analyzer type="query">

      <tokenizer class="solr.StandardTokenizerFactory"/>

      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>

      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>

      <filter class="solr.LowerCaseFilterFactory"/>

    </analyzer>

  </fieldType>

...

  <dynamicField name="*_txt" type="text_general" indexed="true"
stored="true"/>

...

and change two "solr.StandardTokenizerFactory" to
"solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
queries:

q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
Deborah", "Mr Kenyon John" and " Saab Jerry"

q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"

It seems nothing has changed in the results.

Is there anything else I could change?

Looking forward to your reply.

Zhiqing

On Fri, 7 Jan 2022 at 18:12, Andy C <andycs...@gmail.com> wrote:

> The behavior of the range query would depend on how the fieldType used by
> the staffName_txt is configured.
>
> I believe you will find that TextField is not the fieldType, but the base
> class your fieldType is implemented on.
>
> To use an example from one of the provided example schemas, the "_text"
> field is defined as using the "text_general" fieldType
>
>    <field name="_text_" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
> The text_general fieldType is defined as:
>
>     <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> This fieldType definition splits the contents of the field into multiple
> tokens which each get indexed. So for example "Mr Kenyon John" would
> generate 3 tokens: "Mr", "Kenyon" and "John".
>
> If you performed your range query on this field, it would check each token
> separately to see if it was in the specified range. If any token was, the
> document would be retrieved.
>
> If you want the entire contents of the field to be treated as a single
> token, which seems to be your intent, then you should look at using a
> fieldType that is based on the Keyword Tokenizer (see
> https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
>
> - Andy -
>
> On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <z...@ennov.com> wrote:
>
> > Many thanks for your reply. I have changed my query to
> > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
> > staffName_txt:["gross bob" TO "lindmar deborah"]
> > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
> > Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
> > search result contains " Saab Jerry", which is very confusing.
> > Therefore, I think the problem is probably not because of "character
> case"
> >
> > On Fri, 7 Jan 2022 at 17:12, Srijan <shree...@gmail.com> wrote:
> >
> > > My guess is inconsistent "character case" (uppercase/lowercase) in your
> > > indexed data vs your search query. For example, I would expect
> something
> > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return "Mr
> > > Kenyon John" as M indeed does lie between G and l.
> > >
> > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <z...@ennov.com> wrote:
> > >
> > > > Hello,
> > > > I am learning Solr.
> > > > In "The Standard Query Parser", I find:
> > > > Range queries are not limited to date fields or even numerical
> fields,
> > > but
> > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
> > > >
> > > > I tried a range query in a Solr database (8.3)
> > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> > > > staffName_txt is defined as a TextField.
> > > > Most searched results are correct but "Mr Kenyon John" is also in the
> > > > result list.
> > > > I think 'M' is after 'L' and should not be included in the result.
> > > > May I ask what is wrong in my query? Is there a way to avoid the
> > problem?
> > > > Many thanks in advance.
> > > > Kind regards,
> > > > Zhiqing
> > > >
> > >
> >
>

Re: Range query on TextField

Reply via email to