On 4/16/23 07:24, Rajani Maski wrote:
What are your suggestions to improve star queries latencies? By star
queries I mean "*:*" or single term queries having boost formulas  (such as
doc recency and many others) taking 10 or more seconds. It is a large
collection with good compute resources, however I am guessing this may be
because each shard has too many documents and I noticed per shard response
time also is high.

The "*:*" query is special query syntax that normally completes very quickly. It is not a wildcard query -- it is not actually treated internally by Lucene as a wildcard query on all fields. Running "q=*" is vastly different (and a LOT slower) than "q=*:*".

I would say that 500 milliseconds for pretty much ANY query with 96 shards is actually quite good. Improving on that would require a LOT of hardware that wouldn't be cheap.

Splitting shards could be an option however it is already an
evenly distributed, composite router, 96 shards collection, I am
concerned that more than 100 shards per collection can lead to exhaustively
searching too many shards and aggregation issues. What are your thoughts?

Any worries you might have from having more than 100 shards are also present with 96 shards. That is a LOT of shards. Getting results is already taking at least 97 queries -- the main query itself, and a subquery for each shard, after which the main query must assemble the subqueries into a final result. I would personally say that 96 shards is probably way too many unless you've got several billion documents.

Can we make use of any caches, query result cache or other caches, in solr
that allows warming up and persisting these queries results in ram, and
that maybe helps reduce this query time?

Can you share the FULL query you are running and how long it takes? If solrconfig.xml is setting any parameters in the handler definition, we'll need those too.

Looking through the thread... Setting the heap to a percentage of RAM is bad advice. It needs to be set according to what the program actually needs. 25 percent of 128GB is 32GB. It seems very unlikely that you would actually need a heap that large, but I can't definitively say that without more information.

How have you configured the caches in solrconfig.xml?

How many shard replicas (cores) live on each node? How big is each shard replica's on-disk index? How many total nodes are there? Are you running multiple Solr nodes per server?

Thanks,
Shawn

Reply via email to