Re: Range query on TextField

WU, Zhiqing Wed, 12 Jan 2022 14:30:26 -0800

Hi Andy,
Many thanks for your quick reply.
Yes, you are right. According to the webpage of Solr 8.5, I should not edit
"managed-schema". However, when I create the new core (bin/solr create -c
newcore), I only can find managed-schema in server/solr/newcore/conf
folder, and am not able to find schema.xml in any folder belonging to the
core. Some webpages mention renaming "managed-schema" to schema.xml. The
change on managed-schema via Schema API is limited: I can add fields but I
could not know how to change "solr.StandardTokenizerFactory" to
"solr.KeywordTokenizerFactory" via Schema API. I only find "Add Field",
"Add Dynamic Field" and "Add Copy Field" (after clicking "Schema" above
"Segments info") but I have not found something like "Add FieldType" in
Solr UI.
After I installed Solr, it did not have a core. Therefore, I created a core
(newcore, empty, without any document) and then added 4 new documents via
Solr "Documents". After documents have been added, do I need to do
something for index?


Yes, I understand I should create a new fieldType rather than modifying the
text_general fieldType. If I create a new fieldType, could I set the class
of tokenizer to "solr.KeywordTokenizerFactory"?
I will remove StopFilterFactory and SynonymGraphFilterFactory filters. I am
a new hand in Solr and some of my operations might be wrong.

Zhiqing

On Wed, 12 Jan 2022 at 19:12, Andy C <andycs...@gmail.com> wrote:

> Also it doesn't make sense to use the StopFilterFactory or
> SynonymGraphFilterFactory filters in conjunction with the
> KeywordTokenizerFactor, so these should be removed from the fieldType
> definition (personally I would never make use of the StopFilterFactory,
> except in specialized situations).
>
> - Andy -
>
> On Wed, Jan 12, 2022 at 2:02 PM Andy C <andycs...@gmail.com> wrote:
>
> > How are you changing the managed-schema? I have never used the managed
> > schema feature myself, but according to the documentation (
> >
> https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file
> )
> > it should never be directly edited. Not sure how it is supposed to be
> > updated.
> >
> > Did you recreate your indexes after changing the schema (delete the
> > existing indexes and re-add your 4 documents)? This would be necessary,
> as
> > the schema configuration at the time the documents are ingested would
> > determine how they are indexed.
> >
> > Also, you may want to consider creating a new fieldType rather than
> > modifying the text_general fieldType, and explicitly map the
> staffName_txt
> > field to it. Otherwise you will change how searching works for all fields
> > that use this the  text_general fieldType (you would no longer be able to
> > retrieve documents by searching for individual words in the text). If you
> > want to support both behaviors, you might want to create multiple
> versions
> > of the field using the copyField feature.
> >
> > Hope this helps.
> > - Andy -
> >
> > On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <z...@ennov.com> wrote:
> >
> >> Hi Andy,
> >>
> >> Loads of thanks for your reply. I am trying to figure out my problem by
> >> following your advice.
> >>
> >> I have installed Solr (8.5) on my computer and added 4 documents into
> >> a core.
> >>
> >> In the 4 documents, staffName_txt field has been set to "Lindmar
> Deborah",
> >> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.
> >>
> >>
> >>
> >> At the beginning, without changing anything in managed-schema, I did two
> >> range queries:
> >>
> >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> >> Deborah", "Mr Kenyon John" and " Saab Jerry"
> >>
> >> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
> >>
> >>
> >>
> >> After that, I find the fieldType of "text_general" in managed-schema:
> >>
> >>   <fieldType name="text_general" class="solr.TextField"
> >> positionIncrementGap="100" multiValued="true">
> >>
> >>     <analyzer type="index">
> >>
> >>       <tokenizer class="solr.StandardTokenizerFactory"/>
> >>
> >>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> >> ignoreCase="true"/>
> >>
> >>       <filter class="solr.LowerCaseFilterFactory"/>
> >>
> >>     </analyzer>
> >>
> >>     <analyzer type="query">
> >>
> >>       <tokenizer class="solr.StandardTokenizerFactory"/>
> >>
> >>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> >> ignoreCase="true"/>
> >>
> >>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
> >> ignoreCase="true" synonyms="synonyms.txt"/>
> >>
> >>       <filter class="solr.LowerCaseFilterFactory"/>
> >>
> >>     </analyzer>
> >>
> >>   </fieldType>
> >>
> >> ...
> >>
> >>   <dynamicField name="*_txt" type="text_general" indexed="true"
> >> stored="true"/>
> >>
> >> ...
> >>
> >> and change two "solr.StandardTokenizerFactory" to
> >> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
> >> queries:
> >>
> >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> >> Deborah", "Mr Kenyon John" and " Saab Jerry"
> >>
> >> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
> >>
> >> It seems nothing has changed in the results.
> >>
> >> Is there anything else I could change?
> >>
> >> Looking forward to your reply.
> >>
> >> Zhiqing
> >>
> >> On Fri, 7 Jan 2022 at 18:12, Andy C <andycs...@gmail.com> wrote:
> >>
> >> > The behavior of the range query would depend on how the fieldType used
> >> by
> >> > the staffName_txt is configured.
> >> >
> >> > I believe you will find that TextField is not the fieldType, but the
> >> base
> >> > class your fieldType is implemented on.
> >> >
> >> > To use an example from one of the provided example schemas, the
> "_text"
> >> > field is defined as using the "text_general" fieldType
> >> >
> >> >    <field name="_text_" type="text_general" indexed="true"
> >> stored="false"
> >> > multiValued="true"/>
> >> >
> >> > The text_general fieldType is defined as:
> >> >
> >> >     <fieldType name="text_general" class="solr.TextField"
> >> > positionIncrementGap="100" multiValued="true">
> >> >       <analyzer type="index">
> >> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> > words="stopwords.txt" />
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >       </analyzer>
> >> >       <analyzer type="query">
> >> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> > words="stopwords.txt" />
> >> >         <filter class="solr.SynonymGraphFilterFactory"
> >> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >       </analyzer>
> >> >     </fieldType>
> >> >
> >> > This fieldType definition splits the contents of the field into
> multiple
> >> > tokens which each get indexed. So for example "Mr Kenyon John" would
> >> > generate 3 tokens: "Mr", "Kenyon" and "John".
> >> >
> >> > If you performed your range query on this field, it would check each
> >> token
> >> > separately to see if it was in the specified range. If any token was,
> >> the
> >> > document would be retrieved.
> >> >
> >> > If you want the entire contents of the field to be treated as a single
> >> > token, which seems to be your intent, then you should look at using a
> >> > fieldType that is based on the Keyword Tokenizer (see
> >> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
> >> >
> >> > - Andy -
> >> >
> >> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <z...@ennov.com> wrote:
> >> >
> >> > > Many thanks for your reply. I have changed my query to
> >> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
> >> > > staffName_txt:["gross bob" TO "lindmar deborah"]
> >> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
> >> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John",
> my
> >> > > search result contains " Saab Jerry", which is very confusing.
> >> > > Therefore, I think the problem is probably not because of "character
> >> > case"
> >> > >
> >> > > On Fri, 7 Jan 2022 at 17:12, Srijan <shree...@gmail.com> wrote:
> >> > >
> >> > > > My guess is inconsistent "character case" (uppercase/lowercase) in
> >> your
> >> > > > indexed data vs your search query. For example, I would expect
> >> > something
> >> > > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to
> return
> >> "Mr
> >> > > > Kenyon John" as M indeed does lie between G and l.
> >> > > >
> >> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <z...@ennov.com>
> wrote:
> >> > > >
> >> > > > > Hello,
> >> > > > > I am learning Solr.
> >> > > > > In "The Standard Query Parser", I find:
> >> > > > > Range queries are not limited to date fields or even numerical
> >> > fields,
> >> > > > but
> >> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
> >> > > > >
> >> > > > > I tried a range query in a Solr database (8.3)
> >> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> >> > > > > staffName_txt is defined as a TextField.
> >> > > > > Most searched results are correct but "Mr Kenyon John" is also
> in
> >> the
> >> > > > > result list.
> >> > > > > I think 'M' is after 'L' and should not be included in the
> result.
> >> > > > > May I ask what is wrong in my query? Is there a way to avoid the
> >> > > problem?
> >> > > > > Many thanks in advance.
> >> > > > > Kind regards,
> >> > > > > Zhiqing
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Range query on TextField

Reply via email to