Update processor on nested doc
Hi, I want to use the update processor of solr, on nested doc. for example this is the nested doc: { "id":3 "texts":[ { "id":2 "tmp":["dsa", "dsa", "ewq"] } ] } and I want to use the UP - UniqFieldsUpdateProcessor on the texts.tmp field. how can I do this?
Unique key field
Hi All, We have an id field(unique key) in our schema as follows, We are planning to introduce docValues to this field to save fieldcache space. My understanding on this is that, going forward all the sort or faceting done on id field will make use of docValues and since the stored flag is true for this field normal search will continue to retrieve id from stored fields.The field cache will then not have id field. Please correct me if my understanding is wrong. Also, will there be any other performance impact with this change, for example, query time taking longer than before? Thanks & Regards, Poorna
Re: Unique key field
"string" field type usually has "docValues"=true as default. On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali wrote: > Hi All, > > We have an id field(unique key) in our schema as follows, > multiValued=“false”/> > > We are planning to introduce docValues to this field to save fieldcache > space. My understanding on this is that, going forward all the sort or > faceting done on id field will make use of docValues and since the stored > flag is true for this field normal search will continue to retrieve id from > stored fields.The field cache will then not have id field. > > > Please correct me if my understanding is wrong. > > Also, will there be any other performance impact with this change, for > example, query time taking longer than before? > > Thanks & Regards, > Poorna > -- Vincenzo D'Amore
RE: Re: Unique key field
If docValues are enabled by default for string field, then the sort queries on the field will not occupy field cache and would rather rely on docValues. But in our case, it is filling up the field cache and that is the reason we are planning to enable docValues for that field. Is this functionality version specific? On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: > "string" field type usually has "docValues"=true as default. > > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali > wrote: > > > Hi All, > > > > We have an id field(unique key) in our schema as follows, > > > multiValued=“false”/> > > > > We are planning to introduce docValues to this field to save fieldcache > > space. My understanding on this is that, going forward all the sort or > > faceting done on id field will make use of docValues and since the stored > > flag is true for this field normal search will continue to retrieve id from > > stored fields.The field cache will then not have id field. > > > > > > Please correct me if my understanding is wrong. > > > > Also, will there be any other performance impact with this change, for > > example, query time taking longer than before? > > > > Thanks & Regards, > > Poorna > > > > > -- > Vincenzo D'Amore >
"this.stopWords" is null
I had an 8.11.1 implementation in progress when 9.0 came out, and am trying to convert it so we don't go live on an already outdated version. I'm having trouble adding documents to the index that worked fine with 8.11.1. Shortened error is below: 2022-06-07 13:49:24.190 ERROR (qtp554868511-21) [ x:sku] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 6-1-TB-0701 to the index; possible analysis error. => org.apache.solr.common.SolrException: Exception writing document id 6-1-TB-0701 to the index; possible analysis error. Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.lucene.analysis.CharArraySet.contains(char[], int, int)" because "this.stopWords" is null at org.apache.lucene.analysis.StopFilter.accept(StopFilter.java:97) ~[?:?] at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:52) ~[?:?] at org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:37) ~[?:?] at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1142) ~[?:?] at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:729) ~[?:?] at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620) ~[?:?] at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:239) ~[?:?] at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432) ~[?:?] at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1530) ~[?:?] at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1519) ~[?:?] at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:1046) ~[?:?] at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:416) ~[?:?] at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:369) ~[?:?] at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:300) ~[?:?] ... 80 more I have double checked all the stop filters in my schema.xml in my configset, and they all seem fine. The import should only be using text_general, which is configured like this: I can't figure out what the problem is, or how to do more detailed debugging to find it. Any help would be greatly appreciated.
Re: Re: Unique key field
check your schema version attribute https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/conf/managed-schema.xml#L41 On Tue, Jun 7, 2022 at 9:43 AM Poorna Murali wrote: > If docValues are enabled by default for string field, then the sort queries > on the field will not occupy field cache and would rather rely on > docValues. But in our case, it is filling up the field cache and that is > the reason we are planning to enable docValues for that field. > Is this functionality version specific? > > On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: > > "string" field type usually has "docValues"=true as default. > > > > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali > > wrote: > > > > > Hi All, > > > > > > We have an id field(unique key) in our schema as follows, > > > required=“true” > > > multiValued=“false”/> > > > > > > We are planning to introduce docValues to this field to save fieldcache > > > space. My understanding on this is that, going forward all the sort or > > > faceting done on id field will make use of docValues and since the > stored > > > flag is true for this field normal search will continue to retrieve id > from > > > stored fields.The field cache will then not have id field. > > > > > > > > > Please correct me if my understanding is wrong. > > > > > > Also, will there be any other performance impact with this change, for > > > example, query time taking longer than before? > > > > > > Thanks & Regards, > > > Poorna > > > > > > > > > -- > > Vincenzo D'Amore > > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
Re: Re: Unique key field
Sorry, was thinking this was about doc values as stored... but re-reading that might not be the case (a mail client adding RE: is messing up the threading here)... Another common problem is if you are using old schemas from old versions, some of the "by default" stuff is based on the default schemas... so if you have an old default schema as your original source, these things may not be up to date. On Tue, Jun 7, 2022 at 11:23 AM Gus Heck wrote: > check your schema version attribute > > > https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/conf/managed-schema.xml#L41 > > On Tue, Jun 7, 2022 at 9:43 AM Poorna Murali > wrote: > >> If docValues are enabled by default for string field, then the sort >> queries >> on the field will not occupy field cache and would rather rely on >> docValues. But in our case, it is filling up the field cache and that is >> the reason we are planning to enable docValues for that field. >> Is this functionality version specific? >> >> On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: >> > "string" field type usually has "docValues"=true as default. >> > >> > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali >> > wrote: >> > >> > > Hi All, >> > > >> > > We have an id field(unique key) in our schema as follows, >> > > > required=“true” >> > > multiValued=“false”/> >> > > >> > > We are planning to introduce docValues to this field to save >> fieldcache >> > > space. My understanding on this is that, going forward all the sort or >> > > faceting done on id field will make use of docValues and since the >> stored >> > > flag is true for this field normal search will continue to retrieve id >> from >> > > stored fields.The field cache will then not have id field. >> > > >> > > >> > > Please correct me if my understanding is wrong. >> > > >> > > Also, will there be any other performance impact with this change, for >> > > example, query time taking longer than before? >> > > >> > > Thanks & Regards, >> > > Poorna >> > > >> > >> > >> > -- >> > Vincenzo D'Amore >> > >> > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
Re: "this.stopWords" is null
Commenting out the stop filter allowed documents to be indexed, confirming it was actually the problem. But then queries fail because of not being able to find the synonyms for what looks like a similar reason. I've also tried switching the files to use absolute paths like below, but that also does not work: It certainly seems like the Solr configuration is simply not initializing the Lucene filters correctly. On Tue, Jun 7, 2022 at 9:22 AM Thomas Woodard wrote: > I had an 8.11.1 implementation in progress when 9.0 came out, and am > trying to convert it so we don't go live on an already outdated version. > I'm having trouble adding documents to the index that worked fine with > 8.11.1. Shortened error is below: > > 2022-06-07 13:49:24.190 ERROR (qtp554868511-21) [ x:sku] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception > writing document id 6-1-TB-0701 to the index; possible analysis error. => > org.apache.solr.common.SolrException: Exception writing document id > 6-1-TB-0701 to the index; possible analysis error. > Caused by: java.lang.NullPointerException: Cannot invoke > "org.apache.lucene.analysis.CharArraySet.contains(char[], int, int)" > because "this.stopWords" is null > at > org.apache.lucene.analysis.StopFilter.accept(StopFilter.java:97) ~[?:?] > at > org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:52) > ~[?:?] > at > org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:37) > ~[?:?] > at > org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1142) > ~[?:?] > at > org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:729) > ~[?:?] > at > org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620) > ~[?:?] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:239) > ~[?:?] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432) > ~[?:?] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1530) > ~[?:?] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1519) > ~[?:?] > at > org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:1046) > ~[?:?] > at > org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:416) > ~[?:?] > at > org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:369) > ~[?:?] > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:300) > ~[?:?] > ... 80 more > > I have double checked all the stop filters in my schema.xml in my > configset, and they all seem fine. The import should only be using > text_general, which is configured like this: > positionIncrementGap="100" multiValued="true"> > > > > > > > > > > ignoreCase="true" expand="true"/> > > > > > I can't figure out what the problem is, or how to do more detailed > debugging to find it. Any help would be greatly appreciated. > >
Re: Re: Unique key field
The easiest thing to do is double check your schema.xml and see how the field type "string" is defined. On Tue, Jun 7, 2022 at 3:44 PM Poorna Murali wrote: > If docValues are enabled by default for string field, then the sort queries > on the field will not occupy field cache and would rather rely on > docValues. But in our case, it is filling up the field cache and that is > the reason we are planning to enable docValues for that field. > Is this functionality version specific? > > On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: > > "string" field type usually has "docValues"=true as default. > > > > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali > > wrote: > > > > > Hi All, > > > > > > We have an id field(unique key) in our schema as follows, > > > required=“true” > > > multiValued=“false”/> > > > > > > We are planning to introduce docValues to this field to save fieldcache > > > space. My understanding on this is that, going forward all the sort or > > > faceting done on id field will make use of docValues and since the > stored > > > flag is true for this field normal search will continue to retrieve id > from > > > stored fields.The field cache will then not have id field. > > > > > > > > > Please correct me if my understanding is wrong. > > > > > > Also, will there be any other performance impact with this change, for > > > example, query time taking longer than before? > > > > > > Thanks & Regards, > > > Poorna > > > > > > > > > -- > > Vincenzo D'Amore > > > -- Vincenzo D'Amore
RE: Re: Re: Unique key field
Thanks Gus! The only way to check if docValues is enabled for a field is by ensuring that the field cache is not getting populated while we do a sort on that field. Please confirm if my understanding is correct. On 2022/06/07 15:27:23 Gus Heck wrote: > Sorry, was thinking this was about doc values as stored... but re-reading > that might not be the case (a mail client adding RE: is messing up the > threading here)... Another common problem is if you are using old schemas > from old versions, some of the "by default" stuff is based on the default > schemas... so if you have an old default schema as your original source, > these things may not be up to date. > > On Tue, Jun 7, 2022 at 11:23 AM Gus Heck wrote: > > > check your schema version attribute > > > > > > https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/conf/managed-schema.xml#L41 > > > > On Tue, Jun 7, 2022 at 9:43 AM Poorna Murali > > wrote: > > > >> If docValues are enabled by default for string field, then the sort > >> queries > >> on the field will not occupy field cache and would rather rely on > >> docValues. But in our case, it is filling up the field cache and that is > >> the reason we are planning to enable docValues for that field. > >> Is this functionality version specific? > >> > >> On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: > >> > "string" field type usually has "docValues"=true as default. > >> > > >> > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali > >> > wrote: > >> > > >> > > Hi All, > >> > > > >> > > We have an id field(unique key) in our schema as follows, > >> > > >> required=“true” > >> > > multiValued=“false”/> > >> > > > >> > > We are planning to introduce docValues to this field to save > >> fieldcache > >> > > space. My understanding on this is that, going forward all the sort or > >> > > faceting done on id field will make use of docValues and since the > >> stored > >> > > flag is true for this field normal search will continue to retrieve id > >> from > >> > > stored fields.The field cache will then not have id field. > >> > > > >> > > > >> > > Please correct me if my understanding is wrong. > >> > > > >> > > Also, will there be any other performance impact with this change, for > >> > > example, query time taking longer than before? > >> > > > >> > > Thanks & Regards, > >> > > Poorna > >> > > > >> > > >> > > >> > -- > >> > Vincenzo D'Amore > >> > > >> > > > > > > -- > > http://www.needhamsoftware.com (work) > > http://www.the111shift.com (play) > > > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) >
Re: Re: Re: Unique key field
You can ask Luke: http://localhost:8983/solr/techproducts/admin/luke?show=all&fl=id On Solr 8.11.1, I get this snippet as part of the output: "fields":{ "id":{ "type":"string", "schema":"I-S-U-OF-l", If it had docValues="true" in the schema, the fourth flag would be a D instead of a dash. "schema":"I-SDU-OF-l", Or you can use the schema browser to get the same information in a more visual way: http://localhost:8983/solr/#/techproducts/schema?field=id Thomas Op di 7 jun. 2022 om 17:40 schreef Poorna Murali : > Thanks Gus! The only way to check if docValues is enabled for a field is by > ensuring that the field cache is not getting populated while we do a sort > on that field. Please confirm if my understanding is correct. > > On 2022/06/07 15:27:23 Gus Heck wrote: > > Sorry, was thinking this was about doc values as stored... but re-reading > > that might not be the case (a mail client adding RE: is messing up the > > threading here)... Another common problem is if you are using old schemas > > from old versions, some of the "by default" stuff is based on the default > > schemas... so if you have an old default schema as your original source, > > these things may not be up to date. > > > > On Tue, Jun 7, 2022 at 11:23 AM Gus Heck wrote: > > > > > check your schema version attribute > > > > > > > > > > > https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/conf/managed-schema.xml#L41 > > > > > > On Tue, Jun 7, 2022 at 9:43 AM Poorna Murali > > > wrote: > > > > > >> If docValues are enabled by default for string field, then the sort > > >> queries > > >> on the field will not occupy field cache and would rather rely on > > >> docValues. But in our case, it is filling up the field cache and that > is > > >> the reason we are planning to enable docValues for that field. > > >> Is this functionality version specific? > > >> > > >> On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: > > >> > "string" field type usually has "docValues"=true as default. > > >> > > > >> > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali > > >> > wrote: > > >> > > > >> > > Hi All, > > >> > > > > >> > > We have an id field(unique key) in our schema as follows, > > >> > > > >> required=“true” > > >> > > multiValued=“false”/> > > >> > > > > >> > > We are planning to introduce docValues to this field to save > > >> fieldcache > > >> > > space. My understanding on this is that, going forward all the > sort or > > >> > > faceting done on id field will make use of docValues and since the > > >> stored > > >> > > flag is true for this field normal search will continue to > retrieve id > > >> from > > >> > > stored fields.The field cache will then not have id field. > > >> > > > > >> > > > > >> > > Please correct me if my understanding is wrong. > > >> > > > > >> > > Also, will there be any other performance impact with this change, > for > > >> > > example, query time taking longer than before? > > >> > > > > >> > > Thanks & Regards, > > >> > > Poorna > > >> > > > > >> > > > >> > > > >> > -- > > >> > Vincenzo D'Amore > > >> > > > >> > > > > > > > > > -- > > > http://www.needhamsoftware.com (work) > > > http://www.the111shift.com (play) > > > > > > > > > -- > > http://www.needhamsoftware.com (work) > > http://www.the111shift.com (play) > > >
RE: Re: Re: Re: Unique key field
Thanks Thomas! I will check the same. On 2022/06/07 19:01:37 Thomas Corthals wrote: > You can ask Luke: > http://localhost:8983/solr/techproducts/admin/luke?show=all&fl=id > > On Solr 8.11.1, I get this snippet as part of the output: > > "fields":{ > "id":{ > "type":"string", > "schema":"I-S-U-OF-l", > > If it had docValues="true" in the schema, the fourth flag would be a D > instead of a dash. > > "schema":"I-SDU-OF-l", > > Or you can use the schema browser to get the same information in a > more visual way: > http://localhost:8983/solr/#/techproducts/schema?field=id > > Thomas > > > Op di 7 jun. 2022 om 17:40 schreef Poorna Murali : > > > Thanks Gus! The only way to check if docValues is enabled for a field is by > > ensuring that the field cache is not getting populated while we do a sort > > on that field. Please confirm if my understanding is correct. > > > > On 2022/06/07 15:27:23 Gus Heck wrote: > > > Sorry, was thinking this was about doc values as stored... but re-reading > > > that might not be the case (a mail client adding RE: is messing up the > > > threading here)... Another common problem is if you are using old schemas > > > from old versions, some of the "by default" stuff is based on the default > > > schemas... so if you have an old default schema as your original source, > > > these things may not be up to date. > > > > > > On Tue, Jun 7, 2022 at 11:23 AM Gus Heck wrote: > > > > > > > check your schema version attribute > > > > > > > > > > > > > > > > https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/conf/managed-schema.xml#L41 > > > > > > > > On Tue, Jun 7, 2022 at 9:43 AM Poorna Murali > > > > wrote: > > > > > > > >> If docValues are enabled by default for string field, then the sort > > > >> queries > > > >> on the field will not occupy field cache and would rather rely on > > > >> docValues. But in our case, it is filling up the field cache and that > > is > > > >> the reason we are planning to enable docValues for that field. > > > >> Is this functionality version specific? > > > >> > > > >> On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: > > > >> > "string" field type usually has "docValues"=true as default. > > > >> > > > > >> > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali > > > >> > wrote: > > > >> > > > > >> > > Hi All, > > > >> > > > > > >> > > We have an id field(unique key) in our schema as follows, > > > >> > > > > >> required=“true” > > > >> > > multiValued=“false”/> > > > >> > > > > > >> > > We are planning to introduce docValues to this field to save > > > >> fieldcache > > > >> > > space. My understanding on this is that, going forward all the > > sort or > > > >> > > faceting done on id field will make use of docValues and since the > > > >> stored > > > >> > > flag is true for this field normal search will continue to > > retrieve id > > > >> from > > > >> > > stored fields.The field cache will then not have id field. > > > >> > > > > > >> > > > > > >> > > Please correct me if my understanding is wrong. > > > >> > > > > > >> > > Also, will there be any other performance impact with this change, > > for > > > >> > > example, query time taking longer than before? > > > >> > > > > > >> > > Thanks & Regards, > > > >> > > Poorna > > > >> > > > > > >> > > > > >> > > > > >> > -- > > > >> > Vincenzo D'Amore > > > >> > > > > >> > > > > > > > > > > > > -- > > > > http://www.needhamsoftware.com (work) > > > > http://www.the111shift.com (play) > > > > > > > > > > > > > -- > > > http://www.needhamsoftware.com (work) > > > http://www.the111shift.com (play) > > > > > >
RE: Re: Re: Unique key field
Thanks Vincenzo! I will check the schema too. On 2022/06/07 16:08:21 Vincenzo D'Amore wrote: > The easiest thing to do is double check your schema.xml and see how the > field type "string" is defined. > > On Tue, Jun 7, 2022 at 3:44 PM Poorna Murali wrote: > > > If docValues are enabled by default for string field, then the sort queries > > on the field will not occupy field cache and would rather rely on > > docValues. But in our case, it is filling up the field cache and that is > > the reason we are planning to enable docValues for that field. > > Is this functionality version specific? > > > > On 2022/06/07 12:35:39 Vincenzo D'Amore wrote: > > > "string" field type usually has "docValues"=true as default. > > > > > > On Tue, Jun 7, 2022 at 11:22 AM Poorna Murali > > > wrote: > > > > > > > Hi All, > > > > > > > > We have an id field(unique key) in our schema as follows, > > > > > required=“true” > > > > multiValued=“false”/> > > > > > > > > We are planning to introduce docValues to this field to save fieldcache > > > > space. My understanding on this is that, going forward all the sort or > > > > faceting done on id field will make use of docValues and since the > > stored > > > > flag is true for this field normal search will continue to retrieve id > > from > > > > stored fields.The field cache will then not have id field. > > > > > > > > > > > > Please correct me if my understanding is wrong. > > > > > > > > Also, will there be any other performance impact with this change, for > > > > example, query time taking longer than before? > > > > > > > > Thanks & Regards, > > > > Poorna > > > > > > > > > > > > > -- > > > Vincenzo D'Amore > > > > > > > > -- > Vincenzo D'Amore >