FYI: https://github.com/apache/solr/pull/2551
On Mon, Jul 8, 2024 at 9:55 AM Michael Gibney <mich...@michaelgibney.net> wrote: > > Thanks for reporting back. Found the issue at last, including the > magic number! Will post a fix for this shortly. > > https://github.com/apache/solr/blob/aec6e8f750037fea5f8d01dc49dabf28bf512d68/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L568-L569 > > On Mon, Jul 8, 2024 at 9:05 AM Oleksandr Tkachuk <sasha547...@gmail.com> > wrote: > > > > Hello. > > Unfortunately it didn't help. Still a huge difference between 50 vs 51 > > and disabling enableLazyFieldLoading in solrconfig.xml still helps. > > > > solr-impl 10.0.0-SNAPSHOT 011d713a884559e3efeaa69e4f3c8dd8e630ff22 > > [snapshot build, details omitted] > > cat solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java > > | head -370 | tail -13 > > SolrDocumentStoredFieldVisitor(Set<String> toLoad, IndexReader > > reader, int docId) { > > super(toLoad); > > this.docId = docId; > > this.doc = getDocument(); > > if (documentCache == null) { > > // lazy loading makes no sense if we don't have a `documentCache` > > this.lazyFieldProducer = null; > > this.addLargeFieldsLazily = false; > > } else { > > this.lazyFieldProducer = > > toLoad != null && enableLazyFieldLoading ? new > > LazyDocument(reader, docId) : null; > > this.addLargeFieldsLazily = !largeFields.isEmpty(); > > } > > > > > > On Wed, Jun 26, 2024 at 5:10 AM Michael Gibney > > <mich...@michaelgibney.net> wrote: > > > > > > FYI: > > > https://issues.apache.org/jira/browse/SOLR-17349 > > > https://github.com/apache/solr/pull/2535 > > > > > > I'm curious whether this helps! > > > > > > On Fri, Jun 21, 2024 at 3:08 PM Oleksandr Tkachuk <sasha547...@gmail.com> > > > wrote: > > > > > > > > >If you're set up to try running a patched version on your data, I'm > > > > >curious to know if this will help. > > > > I'll be happy to do this. > > > > > > > > >But maybe it's not so much a magic threshold as arbitrary, and > > > > >specific to the data you're evaluating over. > > > > Well, I tested this case on the collection that you remembered, with a > > > > large number of fields (564133 at this moment) and more documents > > > > there (~68 million documents). The number of documents and their > > > > content are significantly different there from where I tested > > > > previously. And I can say that I was quickly able to reproduce the > > > > problem with magic number 50(51), although not as noticeable as in the > > > > previous one. I confirmed this on absolutely any cardinality and any > > > > variance using hey (I’m more than sure that it will be reproduced on > > > > any other benchmark). Although qtime did not differ visually or did > > > > not differ as much as we would like, with the intensity of queries the > > > > difference grows significantly (but still easier to reproduce on fl= > > > > data that has high unevenness and low cardinality), for example: > > > > Huge cardinality, values almost completely unique: > > > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json" > > > > 'http://localhost:8983/solr/col/select?fl=fld1&wt=json&q=fld1:"fld1value"&start=0&rows=50' > > > > Slowest: 0.0024 secs > > > > Fastest: 0.0009 secs > > > > Average: 0.0013 secs > > > > Requests/sec: 3768.1874 > > > > > > > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json" > > > > 'http://localhost:8983/solr/col/select?fl=fld1&wt=json&q=fld1:"fld1value"&start=0&rows=51' > > > > Slowest: 0.0018 secs > > > > Fastest: 0.0007 secs > > > > Average: 0.0009 secs > > > > Requests/sec: 5620.4994 > > > > > > > > Just 1.5x diff > > > > > > > > > > > > "fld2":["v1",30501964,"v2",4202177,"v3",210886] : > > > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json" > > > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v3"&start=0&rows=50' > > > > Slowest: 0.1198 secs > > > > Fastest: 0.0013 secs > > > > Average: 0.0019 secs > > > > Requests/sec: 2641.0227 > > > > > > > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json" > > > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v3"&start=0&rows=51' > > > > Slowest: 0.0051 secs > > > > Fastest: 0.0003 secs > > > > Average: 0.0003 secs > > > > Requests/sec: 14610.4688 > > > > > > > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json" > > > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v1"&start=0&rows=50' > > > > Slowest: 0.0059 secs > > > > Fastest: 0.0008 secs > > > > Average: 0.0010 secs > > > > Requests/sec: 4795.5539 > > > > > > > > ./hey_linux_amd64 -n 10000 -c 5 -T "application/json" > > > > 'http://localhost:8983/solr/col/select?fl=fld2&wt=json&q=fld2:"v1"&start=0&rows=51' > > > > Slowest: 0.0010 secs > > > > Fastest: 0.0003 secs > > > > Average: 0.0003 secs > > > > Requests/sec: 14726.7978 > > > > > > > > 4-6x diff > > > > > > > > On Fri, Jun 21, 2024 at 4:59 PM Michael Gibney > > > > <mich...@michaelgibney.net> wrote: > > > > > > > > > > Interesting! If turning off lazy field loading helps, I think I have a > > > > > trivial patch that may fix this (i.e. without requiring the workaround > > > > > of disabling lazy field loading -- which, as you say, makes no sense > > > > > to have in effect without the documentCache). The only thing that had > > > > > been stopping me from suggesting this patch right off the bat was the > > > > > "magic" threshold of 50, which I couldn't explain at all. But maybe > > > > > it's not so much a magic threshold as arbitrary, and specific to the > > > > > data you're evaluating over. I'll open an issue/PR more narrowly > > > > > scoped to the change. I'd say you could open the issue, except I still > > > > > don't fully understand the connection between the change I'm > > > > > considering and the behavior you're seeing -- just that they seem very > > > > > likely to be connected. If you're set up to try running a patched > > > > > version on your data, I'm curious to know if this will help. > > > > > > > > > > On Thu, Jun 20, 2024 at 6:16 PM Oleksandr Tkachuk > > > > > <sasha547...@gmail.com> wrote: > > > > > > > > > > > > FYI: There is a solution in the last paragraph, but I still ran your > > > > > > tests, since the solution was found by "Cut and Try" and there is > > > > > > no > > > > > > deep understanding. > > > > > > > > > > > > >I wonder what would happen if you fully bypassed the query cache > > > > > > >(i.e., `q={!cache=false}product_type:"1"`? > > > > > > It does not help, there is not even one millisecond of difference > > > > > > in both cases. > > > > > > > > > > > > >I recall that previously you had a very large number of dynamic > > > > > > >fields. Is that the case here as well? And if so, are the dynamic > > > > > > >fields mostly stored? docValues? > > > > > > This is another collection, I’ll get to the one with many many > > > > > > fields later :)) > > > > > > If this is the ~correct way to count the number of fields, then this > > > > > > collection has the following number of fields: > > > > > > curl -s "http://localhost:8983/solr/XXX/admin/luke?numTerms=0" | > > > > > > grep > > > > > > '"type"' | wc -l > > > > > > 121 > > > > > > Of these, 88 have docvalues enabled and 33 stored. > > > > > > > > > > > > As for the two fields used in query, here's how they are defined in > > > > > > the schema. > > > > > > <field name="product_id" type="plong" indexed="true" > > > > > > stored="true"/> > > > > > > <field name="product_type" type="pint" indexed="true" > > > > > > stored="false"/> > > > > > > <fieldType name="pint" class="solr.IntPointField" > > > > > > docValues="true"/> > > > > > > <fieldType name="plong" class="solr.LongPointField" > > > > > > docValues="true"/> > > > > > > > > > > > > Changing fl= to something like a string field with stored=true > > > > > > without > > > > > > docvalues results in zero changes. > > > > > > I also tried this simple query on string type fields (copying the > > > > > > field) and got the same result. I also tried it on fields where the > > > > > > cardinality was different - the spread was not 150 times, but also > > > > > > often noticeable. In addition, I still do not fully understand the > > > > > > logic of this behavior > > > > > > ("product_type":["3",1069282,"2",710042,"1",13702]) if I do: > > > > > > 1) q=product_type:"1" rows=50 - qtime 150ms > > > > > > 2) q=product_type:"1" rows=51 - qtime 0ms > > > > > > 3) q=product_type:"2" rows=50 - qtime 3ms > > > > > > 4) q=product_type:"2" rows=51 - qtime 0ms > > > > > > 5) q=product_type:"3" rows=50 - qtime 1ms > > > > > > 6) q=product_type:"3" rows=51 - qtime 0ms > > > > > > I checked on other fields and get the same behavior - the fewer > > > > > > documents contain a given value, the slower the query becomes. > > > > > > If I can provide any more information, I will be glad. > > > > > > > > > > > > The problem was solved by turning off enableLazyFieldLoading. I am > > > > > > very surprised that this functionality continues to work when > > > > > > document > > > > > > cache is disabled and I thought that this parameter was intended > > > > > > only > > > > > > for it. In addition, we received an improvement in avg and 95% on > > > > > > many > > > > > > other types of queries, as well as some reduction in CPU load. Are > > > > > > there any consequences or disadvantages of such a decision? If not, > > > > > > then perhaps it is worth paying attention to this problem. > > > > > > > > > > > > On Thu, Jun 20, 2024 at 10:13 PM Michael Gibney > > > > > > <mich...@michaelgibney.net> wrote: > > > > > > > > > > > > > > I've been unable to reproduce anything like this behavior. If > > > > > > > you're > > > > > > > really getting queryResultCache hits for these, then the field > > > > > > > type/etc of the field you're querying on shouldn't make a > > > > > > > difference. > > > > > > > type/etc of the return field (product_id) would be more likely to > > > > > > > matter. I wonder what would happen if you fully bypassed the query > > > > > > > cache (i.e., `q={!cache=false}product_type:"1"`? > > > > > > > > > > > > > > I recall that previously you had a very large number of dynamic > > > > > > > fields. Is that the case here as well? And if so, are the dynamic > > > > > > > fields mostly stored? docValues? > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 14, 2024 at 7:29 AM Oleksandr Tkachuk > > > > > > > <sasha547...@gmail.com> wrote: > > > > > > > > > > > > > > > > Initial data: > > > > > > > > Doc count: 1793026 > > > > > > > > Field: "product_type", point int, indexed true, stored false, > > > > > > > > docvalues true. Values: > > > > > > > > "facet_fields":{ > > > > > > > > "product_type":["3",1069282,"2",710042,"1",13702] > > > > > > > > }, > > > > > > > > Single shard, single instance. > > > > > > > > > > > > > > > > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json" > > > > > > > > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=51' > > > > > > > > Summary: > > > > > > > > Total: 0.6374 secs > > > > > > > > Slowest: 0.0043 secs > > > > > > > > Fastest: 0.0003 secs > > > > > > > > Average: 0.0006 secs > > > > > > > > Requests/sec: 15688.5755 > > > > > > > > > > > > > > > > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json" > > > > > > > > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=50' > > > > > > > > Summary: > > > > > > > > Total: 101.3246 secs > > > > > > > > Slowest: 0.2048 secs > > > > > > > > Fastest: 0.0564 secs > > > > > > > > Average: 0.1007 secs > > > > > > > > Requests/sec: 98.6927 > > > > > > > > > > > > > > > > > > > > > > > > 1) I've already played with queryResultWindowSize and > > > > > > > > queryResultMaxDocsCached by setting different, high and low > > > > > > > > values and > > > > > > > > this is probably not what I'm looking for since it gave a <few > > > > > > > > milliseconds difference in query performance > > > > > > > > 2) Checked on different versions of solr (9.6.1 and 8.7.0) - no > > > > > > > > significant changes > > > > > > > > 3) Tried changing the field type to string - zero performance > > > > > > > > changes > > > > > > > > 4) In both cases I see successful lookups in queryResultCache > > > > > > > > 5) Enabling documentCache solves the problem in this case > > > > > > > > (rows<=50), > > > > > > > > but introduces many other performance issues so it doesn't seem > > > > > > > > like a > > > > > > > > viable option.