Re: Range query on TextField

Andy C Wed, 12 Jan 2022 11:02:33 -0800

How are you changing the managed-schema? I have never used the managed
schema feature myself, but according to the documentation (
https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file)
it should never be directly edited. Not sure how it is supposed to be
updated.


Did you recreate your indexes after changing the schema (delete the
existing indexes and re-add your 4 documents)? This would be necessary, as
the schema configuration at the time the documents are ingested would
determine how they are indexed.

Also, you may want to consider creating a new fieldType rather than
modifying the text_general fieldType, and explicitly map the staffName_txt
field to it. Otherwise you will change how searching works for all fields
that use this the  text_general fieldType (you would no longer be able to
retrieve documents by searching for individual words in the text). If you
want to support both behaviors, you might want to create multiple versions
of the field using the copyField feature.

Hope this helps.
- Andy -

On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <[email protected]> wrote:

> Hi Andy,
>
> Loads of thanks for your reply. I am trying to figure out my problem by
> following your advice.
>
> I have installed Solr (8.5) on my computer and added 4 documents into
> a core.
>
> In the 4 documents, staffName_txt field has been set to "Lindmar Deborah",
> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.
>
>
>
> At the beginning, without changing anything in managed-schema, I did two
> range queries:
>
> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> Deborah", "Mr Kenyon John" and " Saab Jerry"
>
> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>
>
>
> After that, I find the fieldType of "text_general" in managed-schema:
>
>   <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>
>     <analyzer type="index">
>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>
>       <filter class="solr.LowerCaseFilterFactory"/>
>
>     </analyzer>
>
>     <analyzer type="query">
>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>
>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>
>       <filter class="solr.LowerCaseFilterFactory"/>
>
>     </analyzer>
>
>   </fieldType>
>
> ...
>
>   <dynamicField name="*_txt" type="text_general" indexed="true"
> stored="true"/>
>
> ...
>
> and change two "solr.StandardTokenizerFactory" to
> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
> queries:
>
> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> Deborah", "Mr Kenyon John" and " Saab Jerry"
>
> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>
> It seems nothing has changed in the results.
>
> Is there anything else I could change?
>
> Looking forward to your reply.
>
> Zhiqing
>
> On Fri, 7 Jan 2022 at 18:12, Andy C <[email protected]> wrote:
>
> > The behavior of the range query would depend on how the fieldType used by
> > the staffName_txt is configured.
> >
> > I believe you will find that TextField is not the fieldType, but the base
> > class your fieldType is implemented on.
> >
> > To use an example from one of the provided example schemas, the "_text"
> > field is defined as using the "text_general" fieldType
> >
> >    <field name="_text_" type="text_general" indexed="true" stored="false"
> > multiValued="true"/>
> >
> > The text_general fieldType is defined as:
> >
> >     <fieldType name="text_general" class="solr.TextField"
> > positionIncrementGap="100" multiValued="true">
> >       <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >         <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >     </fieldType>
> >
> > This fieldType definition splits the contents of the field into multiple
> > tokens which each get indexed. So for example "Mr Kenyon John" would
> > generate 3 tokens: "Mr", "Kenyon" and "John".
> >
> > If you performed your range query on this field, it would check each
> token
> > separately to see if it was in the specified range. If any token was, the
> > document would be retrieved.
> >
> > If you want the entire contents of the field to be treated as a single
> > token, which seems to be your intent, then you should look at using a
> > fieldType that is based on the Keyword Tokenizer (see
> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
> >
> > - Andy -
> >
> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <[email protected]> wrote:
> >
> > > Many thanks for your reply. I have changed my query to
> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
> > > staffName_txt:["gross bob" TO "lindmar deborah"]
> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
> > > search result contains " Saab Jerry", which is very confusing.
> > > Therefore, I think the problem is probably not because of "character
> > case"
> > >
> > > On Fri, 7 Jan 2022 at 17:12, Srijan <[email protected]> wrote:
> > >
> > > > My guess is inconsistent "character case" (uppercase/lowercase) in
> your
> > > > indexed data vs your search query. For example, I would expect
> > something
> > > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return
> "Mr
> > > > Kenyon John" as M indeed does lie between G and l.
> > > >
> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <[email protected]> wrote:
> > > >
> > > > > Hello,
> > > > > I am learning Solr.
> > > > > In "The Standard Query Parser", I find:
> > > > > Range queries are not limited to date fields or even numerical
> > fields,
> > > > but
> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
> > > > >
> > > > > I tried a range query in a Solr database (8.3)
> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> > > > > staffName_txt is defined as a TextField.
> > > > > Most searched results are correct but "Mr Kenyon John" is also in
> the
> > > > > result list.
> > > > > I think 'M' is after 'L' and should not be included in the result.
> > > > > May I ask what is wrong in my query? Is there a way to avoid the
> > > problem?
> > > > > Many thanks in advance.
> > > > > Kind regards,
> > > > > Zhiqing
> > > > >
> > > >
> > >
> >
>

Re: Range query on TextField

Reply via email to