Sorry, it took too long to get back to this one. The search query "http://host:8983/solr/v9/select?&q=*&rows=10" consistently took ~500 ms. With "distrib=false" all the 96 shards have QTime 0-25 ms. Does this mean aggregation of results from all the shards is taking ~475 ms? I also tried shards.rows=5 and it still returned in ~475 ms query time. I am assuming the sort for a star query is by score, is that adding to such high latency? Why would aggregation take so long? When I do "debug=true"
@Michael Gibney, could you please provide me with an example query to test the improvement implemented as part of SOLR-14765 <https://issues.apache.org/jira/browse/SOLR-14765> @Joel thank you for that tip, the bottleneck seems to be the aggregator and query matching a large set of documents or "*" itself. The memory(ram) on the nodes is the same as index size so it is not a memory/cpu/resource issue and the heap is set to 25% of ram. A query " q=*&fl=id" also has ~500ms latency. An edismax query "qf" "keywords, title, description" matching a large set of documents is taking ~2-3 seconds. Any "boost" applied to it is adding 2 more seconds. Not sure if it is shard size that is the problem, there are ~5m docs and a 60 gb index size per shard, though the ram on the node is 128gb. Appreciate any suggestions for optimizing the queries latency. On Thu, Apr 20, 2023 at 7:30 AM Michael Gibney <mich...@michaelgibney.net> wrote: > > It is a query with popularity and recency boosts, requesting the first > 100 > > docs with 3 fields per doc. > > It sounds like you are scoring/sorting, so the optimization that > Mikhail mentioned would not apply (your use-case is not > "sort-irrelevant"). Can you share more about specifically how your > implementing/invoking your popularity/recency boosts, and how you're > applying the "with three fields per doc" requirement? > > On Wed, Apr 19, 2023 at 5:23 PM Joel Bernstein <joels...@gmail.com> wrote: > > > > To send the query to a single shard you can add the parameter > > "distrib=false" to the query and it will stay on that shard. > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Wed, Apr 19, 2023 at 5:21 PM Joel Bernstein <joels...@gmail.com> > wrote: > > > > > You're hunting for a bottleneck. Here is how I would go about finding > it: > > > > > > First I would run the query on a single shard and see how long it > takes. > > > If the single shard is slow you've found your bottleneck. If its fast > then > > > try the same query on each shard, one of the shards might be slow and > > > you've found your bottleneck. > > > > > > If all the shards are fast then it would seem the bottleneck is the > > > aggregator node. > > > > > > Once you've found the bottleneck then you need to start improving the > > > throughput. Let us know what you find and then we can move on to > discuss > > > how to improve the throughput at the bottleneck. > > > > > > If its very fast thats > > > > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > > On Wed, Apr 19, 2023 at 3:57 PM Rajani Maski <rajinima...@gmail.com> > > > wrote: > > > > > >> Thank you, Mikhail. > > >> > > >> > > >> On Wed, Apr 19, 2023 at 7:59 AM Mikhail Khludnev <m...@apache.org> > wrote: > > >> > > >> > Hello, Rajani. > > >> > I meant [SOLR-14765] optimize DocList creation by skipping sort for > > >> > sort-irrelevant cases - ASF JIRA (apache.org) > > >> > <https://issues.apache.org/jira/browse/SOLR-14765> > > >> > > > >> > On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski <rajinima...@gmail.com > > > > >> > wrote: > > >> > > > >> > > Hi Mikhail, > > >> > > > > >> > > Yes, 9.1.1, that should be helpful, can you please point me to > the > > >> > > related jira(s) and/or docs? > > >> > > > > >> > > Thank you, > > >> > > Rajani > > >> > > > > >> > > > > >> > > > > >> > > On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev <m...@apache.org > > > > >> > wrote: > > >> > > > > >> > > > Hello Rajani. > > >> > > > Which version are you running? IIRC 9.1.2 has some > > >> > > > improvement about caching short queries. > > >> > > > > > >> > > > On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski < > rajinima...@gmail.com > > >> > > > >> > > > wrote: > > >> > > > > > >> > > > > Hi Solr Users, > > >> > > > > > > >> > > > > What are your suggestions to improve star queries latencies? > By > > >> star > > >> > > > > queries I mean "*:*" or single term queries having boost > formulas > > >> > > (such > > >> > > > as > > >> > > > > doc recency and many others) taking 10 or more seconds. It is > a > > >> large > > >> > > > > collection with good compute resources, however I am guessing > this > > >> > may > > >> > > be > > >> > > > > because each shard has too many documents and I noticed per > shard > > >> > > > response > > >> > > > > time also is high. > > >> > > > > > > >> > > > > Splitting shards could be an option however it is already an > > >> > > > > evenly distributed, composite router, 96 shards collection, I > am > > >> > > > > concerned that more than 100 shards per collection can lead to > > >> > > > exhaustively > > >> > > > > searching too many shards and aggregation issues. What are > your > > >> > > thoughts? > > >> > > > > > > >> > > > > Can we make use of any caches, query result cache or other > > >> caches, in > > >> > > > solr > > >> > > > > that allows warming up and persisting these queries results in > > >> ram, > > >> > and > > >> > > > > that maybe helps reduce this query time? > > >> > > > > > > >> > > > > Thanks, > > >> > > > > Rajani > > >> > > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > Sincerely yours > > >> > > > Mikhail Khludnev > > >> > > > https://t.me/MUST_SEARCH > > >> > > > A caveat: Cyrillic! > > >> > > > > > >> > > > > >> > > > >> > > > >> > -- > > >> > Sincerely yours > > >> > Mikhail Khludnev > > >> > https://t.me/MUST_SEARCH > > >> > A caveat: Cyrillic! > > >> > > > >> > > > >