Thanks for reporting back. Found the issue at last, including the
magic number! Will post a fix for this shortly.

https://github.com/apache/solr/blob/aec6e8f750037fea5f8d01dc49dabf28bf512d68/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L568-L569

On Mon, Jul 8, 2024 at 9:05 AM Oleksandr Tkachuk <sasha547...@gmail.com> wrote:
>
> Hello.
> Unfortunately it didn't help. Still a huge difference between 50 vs 51
> and disabling enableLazyFieldLoading in solrconfig.xml still helps.
>
> solr-impl 10.0.0-SNAPSHOT 011d713a884559e3efeaa69e4f3c8dd8e630ff22
> [snapshot build, details omitted]
> cat solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java
> | head -370 | tail -13
>     SolrDocumentStoredFieldVisitor(Set<String> toLoad, IndexReader
> reader, int docId) {
>       super(toLoad);
>       this.docId = docId;
>       this.doc = getDocument();
>       if (documentCache == null) {
>         // lazy loading makes no sense if we don't have a `documentCache`
>         this.lazyFieldProducer = null;
>         this.addLargeFieldsLazily = false;
>       } else {
>         this.lazyFieldProducer =
>             toLoad != null && enableLazyFieldLoading ? new
> LazyDocument(reader, docId) : null;
>         this.addLargeFieldsLazily = !largeFields.isEmpty();
>       }
>
>
> On Wed, Jun 26, 2024 at 5:10 AM Michael Gibney
> <mich...@michaelgibney.net> wrote:
> >
> > FYI:
> > https://issues.apache.org/jira/browse/SOLR-17349
> > https://github.com/apache/solr/pull/2535
> >
> > I'm curious whether this helps!
> >
> > On Fri, Jun 21, 2024 at 3:08 PM Oleksandr Tkachuk <sasha547...@gmail.com> 
> > wrote:
> > >
> > > >If you're set up to try running a patched version on your data, I'm 
> > > >curious to know if this will help.
> > > I'll be happy to do this.
> > >
> > > >But maybe it's not so much a magic threshold as arbitrary, and specific 
> > > >to the data you're evaluating over.
> > > Well, I tested this case on the collection that you remembered, with a
> > > large number of fields (564133 at this moment) and more documents
> > > there (~68 million documents). The number of documents and their
> > > content are significantly different there from where I tested
> > > previously. And I can say that I was quickly able to reproduce the
> > > problem with magic number 50(51), although not as noticeable as in the
> > > previous one. I confirmed this on absolutely any cardinality and any
> > > variance using hey (I’m more than sure that it will be reproduced on
> > > any other benchmark). Although qtime did not differ visually or did
> > > not differ as much as we would like, with the intensity of queries the
> > > difference grows significantly (but still easier to reproduce on fl=
> > > data that has high unevenness and low cardinality), for example:
> > > Huge cardinality, values almost completely unique:
> > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json"
> > > 'http://localhost:8983/solr/col/select?fl=fld1&wt=json&q=fld1:"fld1value"&start=0&rows=50'
> > >   Slowest:      0.0024 secs
> > >   Fastest:      0.0009 secs
> > >   Average:      0.0013 secs
> > >   Requests/sec: 3768.1874
> > >
> > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json"
> > > 'http://localhost:8983/solr/col/select?fl=fld1&wt=json&q=fld1:"fld1value"&start=0&rows=51'
> > >   Slowest:      0.0018 secs
> > >   Fastest:      0.0007 secs
> > >   Average:      0.0009 secs
> > >   Requests/sec: 5620.4994
> > >
> > > Just 1.5x diff
> > >
> > >
> > > "fld2":["v1",30501964,"v2",4202177,"v3",210886] :
> > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json"
> > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v3"&start=0&rows=50'
> > >   Slowest:      0.1198 secs
> > >   Fastest:      0.0013 secs
> > >   Average:      0.0019 secs
> > >   Requests/sec: 2641.0227
> > >
> > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json"
> > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v3"&start=0&rows=51'
> > >   Slowest:      0.0051 secs
> > >   Fastest:      0.0003 secs
> > >   Average:      0.0003 secs
> > >   Requests/sec: 14610.4688
> > >
> > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json"
> > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v1"&start=0&rows=50'
> > >   Slowest:      0.0059 secs
> > >   Fastest:      0.0008 secs
> > >   Average:      0.0010 secs
> > >   Requests/sec: 4795.5539
> > >
> > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json"
> > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v1"&start=0&rows=51'
> > >   Slowest:      0.0010 secs
> > >   Fastest:      0.0003 secs
> > >   Average:      0.0003 secs
> > >   Requests/sec: 14726.7978
> > >
> > > 4-6x diff
> > >
> > > On Fri, Jun 21, 2024 at 4:59 PM Michael Gibney
> > > <mich...@michaelgibney.net> wrote:
> > > >
> > > > Interesting! If turning off lazy field loading helps, I think I have a
> > > > trivial patch that may fix this (i.e. without requiring the workaround
> > > > of disabling lazy field loading -- which, as you say, makes no sense
> > > > to have in effect without the documentCache). The only thing that had
> > > > been stopping me from suggesting this patch right off the bat was the
> > > > "magic" threshold of 50, which I couldn't explain at all. But maybe
> > > > it's not so much a magic threshold as arbitrary, and specific to the
> > > > data you're evaluating over. I'll open an issue/PR more narrowly
> > > > scoped to the change. I'd say you could open the issue, except I still
> > > > don't fully understand the connection between the change I'm
> > > > considering and the behavior you're seeing -- just that they seem very
> > > > likely to be connected. If you're set up to try running a patched
> > > > version on your data, I'm curious to know if this will help.
> > > >
> > > > On Thu, Jun 20, 2024 at 6:16 PM Oleksandr Tkachuk 
> > > > <sasha547...@gmail.com> wrote:
> > > > >
> > > > > FYI: There is a solution in the last paragraph, but I still ran your
> > > > > tests, since the solution was found by "Cut and Try"  and there is no
> > > > > deep understanding.
> > > > >
> > > > > >I wonder what would happen if you fully bypassed the query cache 
> > > > > >(i.e., `q={!cache=false}product_type:"1"`?
> > > > > It does not help, there is not even one millisecond of difference in 
> > > > > both cases.
> > > > >
> > > > > >I recall that previously you had a very large number of dynamic 
> > > > > >fields. Is that the case here as well? And if so, are the dynamic 
> > > > > >fields mostly stored? docValues?
> > > > > This is another collection, I’ll get to the one with many many fields 
> > > > > later :))
> > > > > If this is the ~correct way to count the number of fields, then this
> > > > > collection has the following number of fields:
> > > > > curl -s "http://localhost:8983/solr/XXX/admin/luke?numTerms=0"; | grep
> > > > > '"type"' | wc -l
> > > > > 121
> > > > > Of these, 88 have docvalues enabled and 33 stored.
> > > > >
> > > > > As for the two fields used in query, here's how they are defined in 
> > > > > the schema.
> > > > >   <field name="product_id" type="plong" indexed="true" stored="true"/>
> > > > >   <field name="product_type" type="pint" indexed="true" 
> > > > > stored="false"/>
> > > > >   <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
> > > > >   <fieldType name="plong" class="solr.LongPointField" 
> > > > > docValues="true"/>
> > > > >
> > > > > Changing fl= to something like a string field with stored=true without
> > > > > docvalues results in zero changes.
> > > > > I also tried this simple query on string type fields (copying the
> > > > > field) and got the same result. I also tried it on fields where the
> > > > > cardinality was different - the spread was not 150 times, but also
> > > > > often noticeable. In addition, I still do not fully understand the
> > > > > logic of this behavior
> > > > > ("product_type":["3",1069282,"2",710042,"1",13702]) if I do:
> > > > > 1) q=product_type:"1" rows=50 - qtime 150ms
> > > > > 2) q=product_type:"1" rows=51 - qtime 0ms
> > > > > 3) q=product_type:"2" rows=50 - qtime 3ms
> > > > > 4) q=product_type:"2" rows=51 - qtime 0ms
> > > > > 5) q=product_type:"3" rows=50 - qtime 1ms
> > > > > 6) q=product_type:"3" rows=51 - qtime 0ms
> > > > > I checked on other fields and get the same behavior - the fewer
> > > > > documents contain a given value, the slower the query becomes.
> > > > > If I can provide any more information, I will be glad.
> > > > >
> > > > > The problem was solved by turning off enableLazyFieldLoading. I am
> > > > > very surprised that this functionality continues to work when document
> > > > > cache is disabled and I thought that this parameter was intended only
> > > > > for it. In addition, we received an improvement in avg and 95% on many
> > > > > other types of queries, as well as some reduction in CPU load. Are
> > > > > there any consequences or disadvantages of such a decision? If not,
> > > > > then perhaps it is worth paying attention to this problem.
> > > > >
> > > > > On Thu, Jun 20, 2024 at 10:13 PM Michael Gibney
> > > > > <mich...@michaelgibney.net> wrote:
> > > > > >
> > > > > > I've been unable to reproduce anything like this behavior. If you're
> > > > > > really getting queryResultCache hits for these, then the field
> > > > > > type/etc of the field you're querying on shouldn't make a 
> > > > > > difference.
> > > > > > type/etc of the return field (product_id) would be more likely to
> > > > > > matter. I wonder what would happen if you fully bypassed the query
> > > > > > cache (i.e., `q={!cache=false}product_type:"1"`?
> > > > > >
> > > > > > I recall that previously you had a very large number of dynamic
> > > > > > fields. Is that the case here as well? And if so, are the dynamic
> > > > > > fields mostly stored? docValues?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 14, 2024 at 7:29 AM Oleksandr Tkachuk 
> > > > > > <sasha547...@gmail.com> wrote:
> > > > > > >
> > > > > > > Initial data:
> > > > > > > Doc count: 1793026
> > > > > > > Field: "product_type", point int, indexed true, stored false,
> > > > > > > docvalues true. Values:
> > > > > > >  "facet_fields":{
> > > > > > >       "product_type":["3",1069282,"2",710042,"1",13702]
> > > > > > >     },
> > > > > > > Single shard, single instance.
> > > > > > >
> > > > > > > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json"
> > > > > > > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=51'
> > > > > > > Summary:
> > > > > > >   Total:        0.6374 secs
> > > > > > >   Slowest:      0.0043 secs
> > > > > > >   Fastest:      0.0003 secs
> > > > > > >   Average:      0.0006 secs
> > > > > > >   Requests/sec: 15688.5755
> > > > > > >
> > > > > > > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json"
> > > > > > > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=50'
> > > > > > > Summary:
> > > > > > >   Total:        101.3246 secs
> > > > > > >   Slowest:      0.2048 secs
> > > > > > >   Fastest:      0.0564 secs
> > > > > > >   Average:      0.1007 secs
> > > > > > >   Requests/sec: 98.6927
> > > > > > >
> > > > > > >
> > > > > > > 1) I've already played with queryResultWindowSize and
> > > > > > > queryResultMaxDocsCached by setting different, high and low 
> > > > > > > values and
> > > > > > > this is probably not what I'm looking for since it gave a <few
> > > > > > > milliseconds difference in query performance
> > > > > > > 2) Checked on different versions of solr (9.6.1 and 8.7.0) - no
> > > > > > > significant changes
> > > > > > > 3) Tried changing the field type to string - zero performance 
> > > > > > > changes
> > > > > > > 4) In both cases I see successful lookups in queryResultCache
> > > > > > > 5) Enabling documentCache solves the problem in this case 
> > > > > > > (rows<=50),
> > > > > > > but introduces many other performance issues so it doesn't seem 
> > > > > > > like a
> > > > > > > viable option.

Reply via email to