Also it doesn't make sense to use the StopFilterFactory or SynonymGraphFilterFactory filters in conjunction with the KeywordTokenizerFactor, so these should be removed from the fieldType definition (personally I would never make use of the StopFilterFactory, except in specialized situations).
- Andy - On Wed, Jan 12, 2022 at 2:02 PM Andy C <andycs...@gmail.com> wrote: > How are you changing the managed-schema? I have never used the managed > schema feature myself, but according to the documentation ( > https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file) > it should never be directly edited. Not sure how it is supposed to be > updated. > > Did you recreate your indexes after changing the schema (delete the > existing indexes and re-add your 4 documents)? This would be necessary, as > the schema configuration at the time the documents are ingested would > determine how they are indexed. > > Also, you may want to consider creating a new fieldType rather than > modifying the text_general fieldType, and explicitly map the staffName_txt > field to it. Otherwise you will change how searching works for all fields > that use this the text_general fieldType (you would no longer be able to > retrieve documents by searching for individual words in the text). If you > want to support both behaviors, you might want to create multiple versions > of the field using the copyField feature. > > Hope this helps. > - Andy - > > On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <z...@ennov.com> wrote: > >> Hi Andy, >> >> Loads of thanks for your reply. I am trying to figure out my problem by >> following your advice. >> >> I have installed Solr (8.5) on my computer and added 4 documents into >> a core. >> >> In the 4 documents, staffName_txt field has been set to "Lindmar Deborah", >> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively. >> >> >> >> At the beginning, without changing anything in managed-schema, I did two >> range queries: >> >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"] result: "Lindmar >> Deborah", "Mr Kenyon John" and " Saab Jerry" >> >> q: staffName_txt:[* TO "Lindmar Deborah"] result: "Lindmar >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob" >> >> >> >> After that, I find the fieldType of "text_general" in managed-schema: >> >> <fieldType name="text_general" class="solr.TextField" >> positionIncrementGap="100" multiValued="true"> >> >> <analyzer type="index"> >> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> >> <filter class="solr.StopFilterFactory" words="stopwords.txt" >> ignoreCase="true"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> >> </analyzer> >> >> <analyzer type="query"> >> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> >> <filter class="solr.StopFilterFactory" words="stopwords.txt" >> ignoreCase="true"/> >> >> <filter class="solr.SynonymGraphFilterFactory" expand="true" >> ignoreCase="true" synonyms="synonyms.txt"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> >> </analyzer> >> >> </fieldType> >> >> ... >> >> <dynamicField name="*_txt" type="text_general" indexed="true" >> stored="true"/> >> >> ... >> >> and change two "solr.StandardTokenizerFactory" to >> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range >> queries: >> >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"] result: "Lindmar >> Deborah", "Mr Kenyon John" and " Saab Jerry" >> >> q: staffName_txt:[* TO "Lindmar Deborah"] result: "Lindmar >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob" >> >> It seems nothing has changed in the results. >> >> Is there anything else I could change? >> >> Looking forward to your reply. >> >> Zhiqing >> >> On Fri, 7 Jan 2022 at 18:12, Andy C <andycs...@gmail.com> wrote: >> >> > The behavior of the range query would depend on how the fieldType used >> by >> > the staffName_txt is configured. >> > >> > I believe you will find that TextField is not the fieldType, but the >> base >> > class your fieldType is implemented on. >> > >> > To use an example from one of the provided example schemas, the "_text" >> > field is defined as using the "text_general" fieldType >> > >> > <field name="_text_" type="text_general" indexed="true" >> stored="false" >> > multiValued="true"/> >> > >> > The text_general fieldType is defined as: >> > >> > <fieldType name="text_general" class="solr.TextField" >> > positionIncrementGap="100" multiValued="true"> >> > <analyzer type="index"> >> > <tokenizer class="solr.StandardTokenizerFactory"/> >> > <filter class="solr.StopFilterFactory" ignoreCase="true" >> > words="stopwords.txt" /> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > </analyzer> >> > <analyzer type="query"> >> > <tokenizer class="solr.StandardTokenizerFactory"/> >> > <filter class="solr.StopFilterFactory" ignoreCase="true" >> > words="stopwords.txt" /> >> > <filter class="solr.SynonymGraphFilterFactory" >> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > </analyzer> >> > </fieldType> >> > >> > This fieldType definition splits the contents of the field into multiple >> > tokens which each get indexed. So for example "Mr Kenyon John" would >> > generate 3 tokens: "Mr", "Kenyon" and "John". >> > >> > If you performed your range query on this field, it would check each >> token >> > separately to see if it was in the specified range. If any token was, >> the >> > document would be retrieved. >> > >> > If you want the entire contents of the field to be treated as a single >> > token, which seems to be your intent, then you should look at using a >> > fieldType that is based on the Keyword Tokenizer (see >> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer). >> > >> > - Andy - >> > >> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <z...@ennov.com> wrote: >> > >> > > Many thanks for your reply. I have changed my query to >> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"] >> > > staffName_txt:["gross bob" TO "lindmar deborah"] >> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"] >> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John", my >> > > search result contains " Saab Jerry", which is very confusing. >> > > Therefore, I think the problem is probably not because of "character >> > case" >> > > >> > > On Fri, 7 Jan 2022 at 17:12, Srijan <shree...@gmail.com> wrote: >> > > >> > > > My guess is inconsistent "character case" (uppercase/lowercase) in >> your >> > > > indexed data vs your search query. For example, I would expect >> > something >> > > > like staffName_txt:[ "Gross Bob" TO "lindmar Deborah"] to return >> "Mr >> > > > Kenyon John" as M indeed does lie between G and l. >> > > > >> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <z...@ennov.com> wrote: >> > > > >> > > > > Hello, >> > > > > I am learning Solr. >> > > > > In "The Standard Query Parser", I find: >> > > > > Range queries are not limited to date fields or even numerical >> > fields, >> > > > but >> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen}) >> > > > > >> > > > > I tried a range query in a Solr database (8.3) >> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"] >> > > > > staffName_txt is defined as a TextField. >> > > > > Most searched results are correct but "Mr Kenyon John" is also in >> the >> > > > > result list. >> > > > > I think 'M' is after 'L' and should not be included in the result. >> > > > > May I ask what is wrong in my query? Is there a way to avoid the >> > > problem? >> > > > > Many thanks in advance. >> > > > > Kind regards, >> > > > > Zhiqing >> > > > > >> > > > >> > > >> > >> >