FYI: There is a solution in the last paragraph, but I still ran your
tests, since the solution was found by "Cut and Try"  and there is no
deep understanding.

>I wonder what would happen if you fully bypassed the query cache (i.e., 
>`q={!cache=false}product_type:"1"`?
It does not help, there is not even one millisecond of difference in both cases.

>I recall that previously you had a very large number of dynamic fields. Is 
>that the case here as well? And if so, are the dynamic fields mostly stored? 
>docValues?
This is another collection, I’ll get to the one with many many fields later :))
If this is the ~correct way to count the number of fields, then this
collection has the following number of fields:
curl -s "http://localhost:8983/solr/XXX/admin/luke?numTerms=0"; | grep
'"type"' | wc -l
121
Of these, 88 have docvalues enabled and 33 stored.

As for the two fields used in query, here's how they are defined in the schema.
  <field name="product_id" type="plong" indexed="true" stored="true"/>
  <field name="product_type" type="pint" indexed="true" stored="false"/>
  <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
  <fieldType name="plong" class="solr.LongPointField" docValues="true"/>

Changing fl= to something like a string field with stored=true without
docvalues results in zero changes.
I also tried this simple query on string type fields (copying the
field) and got the same result. I also tried it on fields where the
cardinality was different - the spread was not 150 times, but also
often noticeable. In addition, I still do not fully understand the
logic of this behavior
("product_type":["3",1069282,"2",710042,"1",13702]) if I do:
1) q=product_type:"1" rows=50 - qtime 150ms
2) q=product_type:"1" rows=51 - qtime 0ms
3) q=product_type:"2" rows=50 - qtime 3ms
4) q=product_type:"2" rows=51 - qtime 0ms
5) q=product_type:"3" rows=50 - qtime 1ms
6) q=product_type:"3" rows=51 - qtime 0ms
I checked on other fields and get the same behavior - the fewer
documents contain a given value, the slower the query becomes.
If I can provide any more information, I will be glad.

The problem was solved by turning off enableLazyFieldLoading. I am
very surprised that this functionality continues to work when document
cache is disabled and I thought that this parameter was intended only
for it. In addition, we received an improvement in avg and 95% on many
other types of queries, as well as some reduction in CPU load. Are
there any consequences or disadvantages of such a decision? If not,
then perhaps it is worth paying attention to this problem.

On Thu, Jun 20, 2024 at 10:13 PM Michael Gibney
<mich...@michaelgibney.net> wrote:
>
> I've been unable to reproduce anything like this behavior. If you're
> really getting queryResultCache hits for these, then the field
> type/etc of the field you're querying on shouldn't make a difference.
> type/etc of the return field (product_id) would be more likely to
> matter. I wonder what would happen if you fully bypassed the query
> cache (i.e., `q={!cache=false}product_type:"1"`?
>
> I recall that previously you had a very large number of dynamic
> fields. Is that the case here as well? And if so, are the dynamic
> fields mostly stored? docValues?
>
>
>
> On Fri, Jun 14, 2024 at 7:29 AM Oleksandr Tkachuk <sasha547...@gmail.com> 
> wrote:
> >
> > Initial data:
> > Doc count: 1793026
> > Field: "product_type", point int, indexed true, stored false,
> > docvalues true. Values:
> >  "facet_fields":{
> >       "product_type":["3",1069282,"2",710042,"1",13702]
> >     },
> > Single shard, single instance.
> >
> > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json"
> > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=51'
> > Summary:
> >   Total:        0.6374 secs
> >   Slowest:      0.0043 secs
> >   Fastest:      0.0003 secs
> >   Average:      0.0006 secs
> >   Requests/sec: 15688.5755
> >
> > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json"
> > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=50'
> > Summary:
> >   Total:        101.3246 secs
> >   Slowest:      0.2048 secs
> >   Fastest:      0.0564 secs
> >   Average:      0.1007 secs
> >   Requests/sec: 98.6927
> >
> >
> > 1) I've already played with queryResultWindowSize and
> > queryResultMaxDocsCached by setting different, high and low values and
> > this is probably not what I'm looking for since it gave a <few
> > milliseconds difference in query performance
> > 2) Checked on different versions of solr (9.6.1 and 8.7.0) - no
> > significant changes
> > 3) Tried changing the field type to string - zero performance changes
> > 4) In both cases I see successful lookups in queryResultCache
> > 5) Enabling documentCache solves the problem in this case (rows<=50),
> > but introduces many other performance issues so it doesn't seem like a
> > viable option.

Reply via email to