Re: Range query on TextField

Andy C Wed, 12 Jan 2022 11:12:50 -0800

Also it doesn't make sense to use the StopFilterFactory or
SynonymGraphFilterFactory filters in conjunction with the
KeywordTokenizerFactor, so these should be removed from the fieldType
definition (personally I would never make use of the StopFilterFactory,
except in specialized situations).


- Andy -

On Wed, Jan 12, 2022 at 2:02 PM Andy C <[email protected]> wrote:

> How are you changing the managed-schema? I have never used the managed
> schema feature myself, but according to the documentation (
> https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file)
> it should never be directly edited. Not sure how it is supposed to be
> updated.
>
> Did you recreate your indexes after changing the schema (delete the
> existing indexes and re-add your 4 documents)? This would be necessary, as
> the schema configuration at the time the documents are ingested would
> determine how they are indexed.
>
> Also, you may want to consider creating a new fieldType rather than
> modifying the text_general fieldType, and explicitly map the staffName_txt
> field to it. Otherwise you will change how searching works for all fields
> that use this the  text_general fieldType (you would no longer be able to
> retrieve documents by searching for individual words in the text). If you
> want to support both behaviors, you might want to create multiple versions
> of the field using the copyField feature.
>
> Hope this helps.
> - Andy -
>
> On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <[email protected]> wrote:
>
>> Hi Andy,
>>
>> Loads of thanks for your reply. I am trying to figure out my problem by
>> following your advice.
>>
>> I have installed Solr (8.5) on my computer and added 4 documents into
>> a core.
>>
>> In the 4 documents, staffName_txt field has been set to "Lindmar Deborah",
>> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.
>>
>>
>>
>> At the beginning, without changing anything in managed-schema, I did two
>> range queries:
>>
>> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
>> Deborah", "Mr Kenyon John" and " Saab Jerry"
>>
>> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
>> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>>
>>
>>
>> After that, I find the fieldType of "text_general" in managed-schema:
>>
>>   <fieldType name="text_general" class="solr.TextField"
>> positionIncrementGap="100" multiValued="true">
>>
>>     <analyzer type="index">
>>
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> ignoreCase="true"/>
>>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>
>>     </analyzer>
>>
>>     <analyzer type="query">
>>
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> ignoreCase="true"/>
>>
>>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
>> ignoreCase="true" synonyms="synonyms.txt"/>
>>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>
>>     </analyzer>
>>
>>   </fieldType>
>>
>> ...
>>
>>   <dynamicField name="*_txt" type="text_general" indexed="true"
>> stored="true"/>
>>
>> ...
>>
>> and change two "solr.StandardTokenizerFactory" to
>> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
>> queries:
>>
>> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
>> Deborah", "Mr Kenyon John" and " Saab Jerry"
>>
>> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
>> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>>
>> It seems nothing has changed in the results.
>>
>> Is there anything else I could change?
>>
>> Looking forward to your reply.
>>
>> Zhiqing
>>
>> On Fri, 7 Jan 2022 at 18:12, Andy C <[email protected]> wrote:
>>
>> > The behavior of the range query would depend on how the fieldType used
>> by
>> > the staffName_txt is configured.
>> >
>> > I believe you will find that TextField is not the fieldType, but the
>> base
>> > class your fieldType is implemented on.
>> >
>> > To use an example from one of the provided example schemas, the "_text"
>> > field is defined as using the "text_general" fieldType
>> >
>> >    <field name="_text_" type="text_general" indexed="true"
>> stored="false"
>> > multiValued="true"/>
>> >
>> > The text_general fieldType is defined as:
>> >
>> >     <fieldType name="text_general" class="solr.TextField"
>> > positionIncrementGap="100" multiValued="true">
>> >       <analyzer type="index">
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt" />
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >       </analyzer>
>> >       <analyzer type="query">
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt" />
>> >         <filter class="solr.SynonymGraphFilterFactory"
>> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >       </analyzer>
>> >     </fieldType>
>> >
>> > This fieldType definition splits the contents of the field into multiple
>> > tokens which each get indexed. So for example "Mr Kenyon John" would
>> > generate 3 tokens: "Mr", "Kenyon" and "John".
>> >
>> > If you performed your range query on this field, it would check each
>> token
>> > separately to see if it was in the specified range. If any token was,
>> the
>> > document would be retrieved.
>> >
>> > If you want the entire contents of the field to be treated as a single
>> > token, which seems to be your intent, then you should look at using a
>> > fieldType that is based on the Keyword Tokenizer (see
>> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
>> >
>> > - Andy -
>> >
>> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <[email protected]> wrote:
>> >
>> > > Many thanks for your reply. I have changed my query to
>> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
>> > > staffName_txt:["gross bob" TO "lindmar deborah"]
>> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
>> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
>> > > search result contains " Saab Jerry", which is very confusing.
>> > > Therefore, I think the problem is probably not because of "character
>> > case"
>> > >
>> > > On Fri, 7 Jan 2022 at 17:12, Srijan <[email protected]> wrote:
>> > >
>> > > > My guess is inconsistent "character case" (uppercase/lowercase) in
>> your
>> > > > indexed data vs your search query. For example, I would expect
>> > something
>> > > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return
>> "Mr
>> > > > Kenyon John" as M indeed does lie between G and l.
>> > > >
>> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <[email protected]> wrote:
>> > > >
>> > > > > Hello,
>> > > > > I am learning Solr.
>> > > > > In "The Standard Query Parser", I find:
>> > > > > Range queries are not limited to date fields or even numerical
>> > fields,
>> > > > but
>> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
>> > > > >
>> > > > > I tried a range query in a Solr database (8.3)
>> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
>> > > > > staffName_txt is defined as a TextField.
>> > > > > Most searched results are correct but "Mr Kenyon John" is also in
>> the
>> > > > > result list.
>> > > > > I think 'M' is after 'L' and should not be included in the result.
>> > > > > May I ask what is wrong in my query? Is there a way to avoid the
>> > > problem?
>> > > > > Many thanks in advance.
>> > > > > Kind regards,
>> > > > > Zhiqing
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Range query on TextField

Reply via email to