Re: Suggestions to improve Star queries latencies

rajani m Wed, 01 Nov 2023 19:02:47 -0700

Sorry, it took too long to get back to this one.

 The search query "http://host:8983/solr/v9/select?&q=*&rows=10"; consistently
took ~500 ms.  With "distrib=false" all the 96 shards have QTime 0-25 ms.
Does this mean aggregation of results from all the shards is taking ~475
ms? I also tried shards.rows=5 and it still returned in ~475 ms query time.
I am assuming the sort for a star query is by score, is that adding to such
high latency? Why would aggregation take so long? When I do "debug=true"


@Michael Gibney, could you please provide me with an example query to test
the improvement implemented as part of SOLR-14765
<https://issues.apache.org/jira/browse/SOLR-14765>

@Joel thank you for that tip, the bottleneck seems to be the aggregator and
query matching a large set of documents or "*" itself.

The memory(ram) on the nodes is the same as index size so it is not a
memory/cpu/resource issue and the heap is set to 25% of ram.  A query "
q=*&fl=id" also has ~500ms latency. An edismax query "qf" "keywords, title,
description" matching a large set of documents is taking ~2-3 seconds. Any
"boost" applied to it is adding 2 more seconds. Not sure if it is shard
size that is the problem, there are ~5m docs and a 60 gb index size per
shard, though the ram on the node is 128gb.

Appreciate any suggestions for optimizing the queries latency.



On Thu, Apr 20, 2023 at 7:30 AM Michael Gibney <mich...@michaelgibney.net>
wrote:

> > It is a query with popularity and recency boosts, requesting the first
> 100
> > docs with 3 fields per doc.
>
> It sounds like you are scoring/sorting, so the optimization that
> Mikhail mentioned would not apply (your use-case is not
> "sort-irrelevant"). Can you share more about specifically how your
> implementing/invoking your popularity/recency boosts, and how you're
> applying the "with three fields per doc" requirement?
>
> On Wed, Apr 19, 2023 at 5:23 PM Joel Bernstein <joels...@gmail.com> wrote:
> >
> > To send the query to a single shard you can add the parameter
> > "distrib=false" to the query and it will stay on that shard.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Wed, Apr 19, 2023 at 5:21 PM Joel Bernstein <joels...@gmail.com>
> wrote:
> >
> > > You're hunting for a bottleneck. Here is how I would go about finding
> it:
> > >
> > > First I would run the query on a single shard and see how long it
> takes.
> > > If the single shard is slow you've found your bottleneck. If its fast
> then
> > > try the same query on each shard, one of the shards might be slow and
> > > you've found your bottleneck.
> > >
> > > If all the shards are fast then it would seem the bottleneck is the
> > > aggregator node.
> > >
> > > Once you've found the bottleneck then you need to start improving the
> > > throughput. Let us know what you find and then we can move on to
> discuss
> > > how to improve the throughput at the bottleneck.
> > >
> > > If its very fast thats
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Wed, Apr 19, 2023 at 3:57 PM Rajani Maski <rajinima...@gmail.com>
> > > wrote:
> > >
> > >> Thank you, Mikhail.
> > >>
> > >>
> > >> On Wed, Apr 19, 2023 at 7:59 AM Mikhail Khludnev <m...@apache.org>
> wrote:
> > >>
> > >> > Hello, Rajani.
> > >> > I meant [SOLR-14765] optimize DocList creation by skipping sort for
> > >> > sort-irrelevant cases - ASF JIRA (apache.org)
> > >> > <https://issues.apache.org/jira/browse/SOLR-14765>
> > >> >
> > >> > On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski <rajinima...@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Hi Mikhail,
> > >> > >
> > >> > >    Yes, 9.1.1, that should be helpful, can you please point me to
> the
> > >> > > related jira(s) and/or docs?
> > >> > >
> > >> > > Thank you,
> > >> > > Rajani
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev <m...@apache.org
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hello Rajani.
> > >> > > > Which version are you running? IIRC 9.1.2 has some
> > >> > > > improvement about caching short queries.
> > >> > > >
> > >> > > > On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski <
> rajinima...@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Solr Users,
> > >> > > > >
> > >> > > > > What are your suggestions to improve star queries latencies?
> By
> > >> star
> > >> > > > > queries I mean "*:*" or single term queries having boost
> formulas
> > >> > > (such
> > >> > > > as
> > >> > > > > doc recency and many others) taking 10 or more seconds. It is
> a
> > >> large
> > >> > > > > collection with good compute resources, however I am guessing
> this
> > >> > may
> > >> > > be
> > >> > > > > because each shard has too many documents and I noticed per
> shard
> > >> > > > response
> > >> > > > > time also is high.
> > >> > > > >
> > >> > > > > Splitting shards could be an option however it is already an
> > >> > > > > evenly distributed, composite router, 96 shards collection, I
> am
> > >> > > > > concerned that more than 100 shards per collection can lead to
> > >> > > > exhaustively
> > >> > > > > searching too many shards and aggregation issues. What are
> your
> > >> > > thoughts?
> > >> > > > >
> > >> > > > > Can we make use of any caches, query result cache or other
> > >> caches, in
> > >> > > > solr
> > >> > > > > that allows warming up and persisting these queries results in
> > >> ram,
> > >> > and
> > >> > > > > that maybe helps reduce this query time?
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Rajani
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Sincerely yours
> > >> > > > Mikhail Khludnev
> > >> > > > https://t.me/MUST_SEARCH
> > >> > > > A caveat: Cyrillic!
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Sincerely yours
> > >> > Mikhail Khludnev
> > >> > https://t.me/MUST_SEARCH
> > >> > A caveat: Cyrillic!
> > >> >
> > >>
> > >
>

Re: Suggestions to improve Star queries latencies

Reply via email to