edge n gram tokenizer could be useful . that would shift query time compute to index time at the cost of bigger index size.
Sent from my iPhone > On Nov 1, 2023, at 7:02 PM, rajani m <rajinima...@gmail.com> wrote: > > Sorry, it took too long to get back to this one. > > The search query "http://host:8983/solr/v9/select?&q=*&rows=10" consistently > took ~500 ms. With "distrib=false" all the 96 shards have QTime 0-25 ms. > Does this mean aggregation of results from all the shards is taking ~475 > ms? I also tried shards.rows=5 and it still returned in ~475 ms query time. > I am assuming the sort for a star query is by score, is that adding to such > high latency? Why would aggregation take so long? When I do "debug=true" > > @Michael Gibney, could you please provide me with an example query to test > the improvement implemented as part of SOLR-14765 > <https://issues.apache.org/jira/browse/SOLR-14765> > > @Joel thank you for that tip, the bottleneck seems to be the aggregator and > query matching a large set of documents or "*" itself. > > The memory(ram) on the nodes is the same as index size so it is not a > memory/cpu/resource issue and the heap is set to 25% of ram. A query " > q=*&fl=id" also has ~500ms latency. An edismax query "qf" "keywords, title, > description" matching a large set of documents is taking ~2-3 seconds. Any > "boost" applied to it is adding 2 more seconds. Not sure if it is shard > size that is the problem, there are ~5m docs and a 60 gb index size per > shard, though the ram on the node is 128gb. > > Appreciate any suggestions for optimizing the queries latency. > > > > On Thu, Apr 20, 2023 at 7:30 AM Michael Gibney <mich...@michaelgibney.net> > wrote: > >>> It is a query with popularity and recency boosts, requesting the first >> 100 >>> docs with 3 fields per doc. >> >> It sounds like you are scoring/sorting, so the optimization that >> Mikhail mentioned would not apply (your use-case is not >> "sort-irrelevant"). Can you share more about specifically how your >> implementing/invoking your popularity/recency boosts, and how you're >> applying the "with three fields per doc" requirement? >> >> On Wed, Apr 19, 2023 at 5:23 PM Joel Bernstein <joels...@gmail.com> wrote: >>> >>> To send the query to a single shard you can add the parameter >>> "distrib=false" to the query and it will stay on that shard. >>> >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> >>> >>> On Wed, Apr 19, 2023 at 5:21 PM Joel Bernstein <joels...@gmail.com> >> wrote: >>> >>>> You're hunting for a bottleneck. Here is how I would go about finding >> it: >>>> >>>> First I would run the query on a single shard and see how long it >> takes. >>>> If the single shard is slow you've found your bottleneck. If its fast >> then >>>> try the same query on each shard, one of the shards might be slow and >>>> you've found your bottleneck. >>>> >>>> If all the shards are fast then it would seem the bottleneck is the >>>> aggregator node. >>>> >>>> Once you've found the bottleneck then you need to start improving the >>>> throughput. Let us know what you find and then we can move on to >> discuss >>>> how to improve the throughput at the bottleneck. >>>> >>>> If its very fast thats >>>> >>>> >>>> >>>> Joel Bernstein >>>> http://joelsolr.blogspot.com/ >>>> >>>> >>>> On Wed, Apr 19, 2023 at 3:57 PM Rajani Maski <rajinima...@gmail.com> >>>> wrote: >>>> >>>>> Thank you, Mikhail. >>>>> >>>>> >>>>> On Wed, Apr 19, 2023 at 7:59 AM Mikhail Khludnev <m...@apache.org> >> wrote: >>>>> >>>>>> Hello, Rajani. >>>>>> I meant [SOLR-14765] optimize DocList creation by skipping sort for >>>>>> sort-irrelevant cases - ASF JIRA (apache.org) >>>>>> <https://issues.apache.org/jira/browse/SOLR-14765> >>>>>> >>>>>> On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski <rajinima...@gmail.com >>> >>>>>> wrote: >>>>>> >>>>>>> Hi Mikhail, >>>>>>> >>>>>>> Yes, 9.1.1, that should be helpful, can you please point me to >> the >>>>>>> related jira(s) and/or docs? >>>>>>> >>>>>>> Thank you, >>>>>>> Rajani >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev <m...@apache.org >>> >>>>>> wrote: >>>>>>> >>>>>>>> Hello Rajani. >>>>>>>> Which version are you running? IIRC 9.1.2 has some >>>>>>>> improvement about caching short queries. >>>>>>>> >>>>>>>> On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski < >> rajinima...@gmail.com >>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Solr Users, >>>>>>>>> >>>>>>>>> What are your suggestions to improve star queries latencies? >> By >>>>> star >>>>>>>>> queries I mean "*:*" or single term queries having boost >> formulas >>>>>>> (such >>>>>>>> as >>>>>>>>> doc recency and many others) taking 10 or more seconds. It is >> a >>>>> large >>>>>>>>> collection with good compute resources, however I am guessing >> this >>>>>> may >>>>>>> be >>>>>>>>> because each shard has too many documents and I noticed per >> shard >>>>>>>> response >>>>>>>>> time also is high. >>>>>>>>> >>>>>>>>> Splitting shards could be an option however it is already an >>>>>>>>> evenly distributed, composite router, 96 shards collection, I >> am >>>>>>>>> concerned that more than 100 shards per collection can lead to >>>>>>>> exhaustively >>>>>>>>> searching too many shards and aggregation issues. What are >> your >>>>>>> thoughts? >>>>>>>>> >>>>>>>>> Can we make use of any caches, query result cache or other >>>>> caches, in >>>>>>>> solr >>>>>>>>> that allows warming up and persisting these queries results in >>>>> ram, >>>>>> and >>>>>>>>> that maybe helps reduce this query time? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Rajani >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Sincerely yours >>>>>>>> Mikhail Khludnev >>>>>>>> https://t.me/MUST_SEARCH >>>>>>>> A caveat: Cyrillic! >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sincerely yours >>>>>> Mikhail Khludnev >>>>>> https://t.me/MUST_SEARCH >>>>>> A caveat: Cyrillic! >>>>>> >>>>> >>>> >>