Hi Andy, Many thanks for your quick reply. Yes, you are right. According to the webpage of Solr 8.5, I should not edit "managed-schema". However, when I create the new core (bin/solr create -c newcore), I only can find managed-schema in server/solr/newcore/conf folder, and am not able to find schema.xml in any folder belonging to the core. Some webpages mention renaming "managed-schema" to schema.xml. The change on managed-schema via Schema API is limited: I can add fields but I could not know how to change "solr.StandardTokenizerFactory" to "solr.KeywordTokenizerFactory" via Schema API. I only find "Add Field", "Add Dynamic Field" and "Add Copy Field" (after clicking "Schema" above "Segments info") but I have not found something like "Add FieldType" in Solr UI. After I installed Solr, it did not have a core. Therefore, I created a core (newcore, empty, without any document) and then added 4 new documents via Solr "Documents". After documents have been added, do I need to do something for index?
Yes, I understand I should create a new fieldType rather than modifying the text_general fieldType. If I create a new fieldType, could I set the class of tokenizer to "solr.KeywordTokenizerFactory"? I will remove StopFilterFactory and SynonymGraphFilterFactory filters. I am a new hand in Solr and some of my operations might be wrong. Zhiqing On Wed, 12 Jan 2022 at 19:12, Andy C <andycs...@gmail.com> wrote: > Also it doesn't make sense to use the StopFilterFactory or > SynonymGraphFilterFactory filters in conjunction with the > KeywordTokenizerFactor, so these should be removed from the fieldType > definition (personally I would never make use of the StopFilterFactory, > except in specialized situations). > > - Andy - > > On Wed, Jan 12, 2022 at 2:02 PM Andy C <andycs...@gmail.com> wrote: > > > How are you changing the managed-schema? I have never used the managed > > schema feature myself, but according to the documentation ( > > > https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file > ) > > it should never be directly edited. Not sure how it is supposed to be > > updated. > > > > Did you recreate your indexes after changing the schema (delete the > > existing indexes and re-add your 4 documents)? This would be necessary, > as > > the schema configuration at the time the documents are ingested would > > determine how they are indexed. > > > > Also, you may want to consider creating a new fieldType rather than > > modifying the text_general fieldType, and explicitly map the > staffName_txt > > field to it. Otherwise you will change how searching works for all fields > > that use this the text_general fieldType (you would no longer be able to > > retrieve documents by searching for individual words in the text). If you > > want to support both behaviors, you might want to create multiple > versions > > of the field using the copyField feature. > > > > Hope this helps. > > - Andy - > > > > On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <z...@ennov.com> wrote: > > > >> Hi Andy, > >> > >> Loads of thanks for your reply. I am trying to figure out my problem by > >> following your advice. > >> > >> I have installed Solr (8.5) on my computer and added 4 documents into > >> a core. > >> > >> In the 4 documents, staffName_txt field has been set to "Lindmar > Deborah", > >> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively. > >> > >> > >> > >> At the beginning, without changing anything in managed-schema, I did two > >> range queries: > >> > >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"] result: "Lindmar > >> Deborah", "Mr Kenyon John" and " Saab Jerry" > >> > >> q: staffName_txt:[* TO "Lindmar Deborah"] result: "Lindmar > >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob" > >> > >> > >> > >> After that, I find the fieldType of "text_general" in managed-schema: > >> > >> <fieldType name="text_general" class="solr.TextField" > >> positionIncrementGap="100" multiValued="true"> > >> > >> <analyzer type="index"> > >> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> > >> <filter class="solr.StopFilterFactory" words="stopwords.txt" > >> ignoreCase="true"/> > >> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> > >> </analyzer> > >> > >> <analyzer type="query"> > >> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> > >> <filter class="solr.StopFilterFactory" words="stopwords.txt" > >> ignoreCase="true"/> > >> > >> <filter class="solr.SynonymGraphFilterFactory" expand="true" > >> ignoreCase="true" synonyms="synonyms.txt"/> > >> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> > >> </analyzer> > >> > >> </fieldType> > >> > >> ... > >> > >> <dynamicField name="*_txt" type="text_general" indexed="true" > >> stored="true"/> > >> > >> ... > >> > >> and change two "solr.StandardTokenizerFactory" to > >> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range > >> queries: > >> > >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"] result: "Lindmar > >> Deborah", "Mr Kenyon John" and " Saab Jerry" > >> > >> q: staffName_txt:[* TO "Lindmar Deborah"] result: "Lindmar > >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob" > >> > >> It seems nothing has changed in the results. > >> > >> Is there anything else I could change? > >> > >> Looking forward to your reply. > >> > >> Zhiqing > >> > >> On Fri, 7 Jan 2022 at 18:12, Andy C <andycs...@gmail.com> wrote: > >> > >> > The behavior of the range query would depend on how the fieldType used > >> by > >> > the staffName_txt is configured. > >> > > >> > I believe you will find that TextField is not the fieldType, but the > >> base > >> > class your fieldType is implemented on. > >> > > >> > To use an example from one of the provided example schemas, the > "_text" > >> > field is defined as using the "text_general" fieldType > >> > > >> > <field name="_text_" type="text_general" indexed="true" > >> stored="false" > >> > multiValued="true"/> > >> > > >> > The text_general fieldType is defined as: > >> > > >> > <fieldType name="text_general" class="solr.TextField" > >> > positionIncrementGap="100" multiValued="true"> > >> > <analyzer type="index"> > >> > <tokenizer class="solr.StandardTokenizerFactory"/> > >> > <filter class="solr.StopFilterFactory" ignoreCase="true" > >> > words="stopwords.txt" /> > >> > <filter class="solr.LowerCaseFilterFactory"/> > >> > </analyzer> > >> > <analyzer type="query"> > >> > <tokenizer class="solr.StandardTokenizerFactory"/> > >> > <filter class="solr.StopFilterFactory" ignoreCase="true" > >> > words="stopwords.txt" /> > >> > <filter class="solr.SynonymGraphFilterFactory" > >> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > >> > <filter class="solr.LowerCaseFilterFactory"/> > >> > </analyzer> > >> > </fieldType> > >> > > >> > This fieldType definition splits the contents of the field into > multiple > >> > tokens which each get indexed. So for example "Mr Kenyon John" would > >> > generate 3 tokens: "Mr", "Kenyon" and "John". > >> > > >> > If you performed your range query on this field, it would check each > >> token > >> > separately to see if it was in the specified range. If any token was, > >> the > >> > document would be retrieved. > >> > > >> > If you want the entire contents of the field to be treated as a single > >> > token, which seems to be your intent, then you should look at using a > >> > fieldType that is based on the Keyword Tokenizer (see > >> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer). > >> > > >> > - Andy - > >> > > >> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <z...@ennov.com> wrote: > >> > > >> > > Many thanks for your reply. I have changed my query to > >> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"] > >> > > staffName_txt:["gross bob" TO "lindmar deborah"] > >> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"] > >> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John", > my > >> > > search result contains " Saab Jerry", which is very confusing. > >> > > Therefore, I think the problem is probably not because of "character > >> > case" > >> > > > >> > > On Fri, 7 Jan 2022 at 17:12, Srijan <shree...@gmail.com> wrote: > >> > > > >> > > > My guess is inconsistent "character case" (uppercase/lowercase) in > >> your > >> > > > indexed data vs your search query. For example, I would expect > >> > something > >> > > > like staffName_txt:[ "Gross Bob" TO "lindmar Deborah"] to > return > >> "Mr > >> > > > Kenyon John" as M indeed does lie between G and l. > >> > > > > >> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <z...@ennov.com> > wrote: > >> > > > > >> > > > > Hello, > >> > > > > I am learning Solr. > >> > > > > In "The Standard Query Parser", I find: > >> > > > > Range queries are not limited to date fields or even numerical > >> > fields, > >> > > > but > >> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen}) > >> > > > > > >> > > > > I tried a range query in a Solr database (8.3) > >> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"] > >> > > > > staffName_txt is defined as a TextField. > >> > > > > Most searched results are correct but "Mr Kenyon John" is also > in > >> the > >> > > > > result list. > >> > > > > I think 'M' is after 'L' and should not be included in the > result. > >> > > > > May I ask what is wrong in my query? Is there a way to avoid the > >> > > problem? > >> > > > > Many thanks in advance. > >> > > > > Kind regards, > >> > > > > Zhiqing > >> > > > > > >> > > > > >> > > > >> > > >> > > >