Re: Suggestions to improve Star queries latencies

Shawn Heisey Wed, 22 Nov 2023 08:04:19 -0800

On 4/16/23 07:24, Rajani Maski wrote:

What are your suggestions to improve star queries latencies? By star
queries I mean "*:*" or single term queries having boost formulas  (such as
doc recency and many others) taking 10 or more seconds. It is a large
collection with good compute resources, however I am guessing this may be
because each shard has too many documents and I noticed per shard response
time also is high.

The "*:*" query is special query syntax that normally completes veryquickly. It is not a wildcard query -- it is not actually treatedinternally by Lucene as a wildcard query on all fields. Running "q=*"is vastly different (and a LOT slower) than "q=*:*".

I would say that 500 milliseconds for pretty much ANY query with 96shards is actually quite good. Improving on that would require a LOT ofhardware that wouldn't be cheap.

Splitting shards could be an option however it is already an
evenly distributed, composite router, 96 shards collection, I am
concerned that more than 100 shards per collection can lead to exhaustively
searching too many shards and aggregation issues. What are your thoughts?

Any worries you might have from having more than 100 shards are alsopresent with 96 shards. That is a LOT of shards. Getting results isalready taking at least 97 queries -- the main query itself, and asubquery for each shard, after which the main query must assemble thesubqueries into a final result. I would personally say that 96 shardsis probably way too many unless you've got several billion documents.

Can we make use of any caches, query result cache or other caches, in solr
that allows warming up and persisting these queries results in ram, and
that maybe helps reduce this query time?

Can you share the FULL query you are running and how long it takes? Ifsolrconfig.xml is setting any parameters in the handler definition,we'll need those too.

Looking through the thread... Setting the heap to a percentage of RAM isbad advice. It needs to be set according to what the program actuallyneeds. 25 percent of 128GB is 32GB. It seems very unlikely that youwould actually need a heap that large, but I can't definitively say thatwithout more information.


How have you configured the caches in solrconfig.xml?

How many shard replicas (cores) live on each node? How big is eachshard replica's on-disk index? How many total nodes are there? Are yourunning multiple Solr nodes per server?


Thanks,
Shawn

Re: Suggestions to improve Star queries latencies

Reply via email to