Re: Suggestions to improve Star queries latencies

2023-04-19 Thread ufuk yılmaz
Do you really have 96 separate disks and memory for each shard? They seemed a 
bit small and numerous to me, unless you are trying to fit every shard into 
memory of separate nodes and have the hardware resources for it

—

> On 19 Apr 2023, at 05:43, Rajani Maski  wrote:
> 
> It is a query with popularity and recency boosts, requesting the first 100
> docs with 3 fields per doc. No facets. It is a query against a collection
> of 96 shards ~7m docs per shard.  Could the cause for latency be boost
> queries and would it also be time spent in aggregating results from many
> shards? Curious to learn more about caching such/short queries that @Mikhail
> mentioned.
> 
>> On Tue, Apr 18, 2023 at 9:44 PM Dave  wrote:
>> 
>> I think there are more important questions here.  What do you want with a
>> *:* query?  Do you want all the results in on return? Or do you just want
>> the count of total documents? Or to put the results in facets?  *:* should
>> never take long unless you are requesting every single document not just
>> the first ten.
>> 
>>> On Apr 18, 2023, at 9:05 PM, Rajani Maski  wrote:
>>> 
>>> Hi Mikhail,
>>> 
>>>  Yes, 9.1.1, that should be helpful, can you please point me to the
>>> related jira(s) and/or docs?
>>> 
>>> Thank you,
>>> Rajani
>>> 
>>> 
>>> 
 On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev 
>> wrote:
 
 Hello Rajani.
 Which version are you running? IIRC 9.1.2 has some
 improvement about caching short queries.
 
 On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski 
 wrote:
 
> Hi Solr Users,
> 
> What are your suggestions to improve star queries latencies? By star
> queries I mean "*:*" or single term queries having boost formulas
>> (such
 as
> doc recency and many others) taking 10 or more seconds. It is a large
> collection with good compute resources, however I am guessing this may
>> be
> because each shard has too many documents and I noticed per shard
 response
> time also is high.
> 
> Splitting shards could be an option however it is already an
> evenly distributed, composite router, 96 shards collection, I am
> concerned that more than 100 shards per collection can lead to
 exhaustively
> searching too many shards and aggregation issues. What are your
>> thoughts?
> 
> Can we make use of any caches, query result cache or other caches, in
 solr
> that allows warming up and persisting these queries results in ram, and
> that maybe helps reduce this query time?
> 
> Thanks,
> Rajani
> 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 https://t.me/MUST_SEARCH
 A caveat: Cyrillic!
 
>> 



Re: Suggestions to improve Star queries latencies

2023-04-19 Thread Mikhail Khludnev
Hello, Rajani.
I meant [SOLR-14765] optimize DocList creation by skipping sort for
sort-irrelevant cases - ASF JIRA (apache.org)


On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski  wrote:

> Hi Mikhail,
>
>Yes, 9.1.1, that should be helpful, can you please point me to the
> related jira(s) and/or docs?
>
> Thank you,
> Rajani
>
>
>
> On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev  wrote:
>
> > Hello Rajani.
> > Which version are you running? IIRC 9.1.2 has some
> > improvement about caching short queries.
> >
> > On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski 
> > wrote:
> >
> > > Hi Solr Users,
> > >
> > > What are your suggestions to improve star queries latencies? By star
> > > queries I mean "*:*" or single term queries having boost formulas
> (such
> > as
> > > doc recency and many others) taking 10 or more seconds. It is a large
> > > collection with good compute resources, however I am guessing this may
> be
> > > because each shard has too many documents and I noticed per shard
> > response
> > > time also is high.
> > >
> > > Splitting shards could be an option however it is already an
> > > evenly distributed, composite router, 96 shards collection, I am
> > > concerned that more than 100 shards per collection can lead to
> > exhaustively
> > > searching too many shards and aggregation issues. What are your
> thoughts?
> > >
> > > Can we make use of any caches, query result cache or other caches, in
> > solr
> > > that allows warming up and persisting these queries results in ram, and
> > > that maybe helps reduce this query time?
> > >
> > > Thanks,
> > > Rajani
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!


Debug time spent in aggregating the search results

2023-04-19 Thread Rajani Maski
Hi Solr Users,

Is there a metric endpoint or a debug/explain type query param that
returns average time spent in aggregating the search results from shards?

Thanks,
Rajani


Re: Suggestions to improve Star queries latencies

2023-04-19 Thread Rajani Maski
Thank you, Mikhail.


On Wed, Apr 19, 2023 at 7:59 AM Mikhail Khludnev  wrote:

> Hello, Rajani.
> I meant [SOLR-14765] optimize DocList creation by skipping sort for
> sort-irrelevant cases - ASF JIRA (apache.org)
> 
>
> On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski 
> wrote:
>
> > Hi Mikhail,
> >
> >Yes, 9.1.1, that should be helpful, can you please point me to the
> > related jira(s) and/or docs?
> >
> > Thank you,
> > Rajani
> >
> >
> >
> > On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev 
> wrote:
> >
> > > Hello Rajani.
> > > Which version are you running? IIRC 9.1.2 has some
> > > improvement about caching short queries.
> > >
> > > On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski 
> > > wrote:
> > >
> > > > Hi Solr Users,
> > > >
> > > > What are your suggestions to improve star queries latencies? By star
> > > > queries I mean "*:*" or single term queries having boost formulas
> > (such
> > > as
> > > > doc recency and many others) taking 10 or more seconds. It is a large
> > > > collection with good compute resources, however I am guessing this
> may
> > be
> > > > because each shard has too many documents and I noticed per shard
> > > response
> > > > time also is high.
> > > >
> > > > Splitting shards could be an option however it is already an
> > > > evenly distributed, composite router, 96 shards collection, I am
> > > > concerned that more than 100 shards per collection can lead to
> > > exhaustively
> > > > searching too many shards and aggregation issues. What are your
> > thoughts?
> > > >
> > > > Can we make use of any caches, query result cache or other caches, in
> > > solr
> > > > that allows warming up and persisting these queries results in ram,
> and
> > > > that maybe helps reduce this query time?
> > > >
> > > > Thanks,
> > > > Rajani
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > https://t.me/MUST_SEARCH
> > > A caveat: Cyrillic!
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>


Re: Suggestions to improve Star queries latencies

2023-04-19 Thread Joel Bernstein
You're hunting for a bottleneck. Here is how I would go about finding it:

First I would run the query on a single shard and see how long it takes. If
the single shard is slow you've found your bottleneck. If its fast then try
the same query on each shard, one of the shards might be slow and you've
found your bottleneck.

If all the shards are fast then it would seem the bottleneck is the
aggregator node.

Once you've found the bottleneck then you need to start improving the
throughput. Let us know what you find and then we can move on to discuss
how to improve the throughput at the bottleneck.

If its very fast thats



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Apr 19, 2023 at 3:57 PM Rajani Maski  wrote:

> Thank you, Mikhail.
>
>
> On Wed, Apr 19, 2023 at 7:59 AM Mikhail Khludnev  wrote:
>
> > Hello, Rajani.
> > I meant [SOLR-14765] optimize DocList creation by skipping sort for
> > sort-irrelevant cases - ASF JIRA (apache.org)
> > 
> >
> > On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski 
> > wrote:
> >
> > > Hi Mikhail,
> > >
> > >Yes, 9.1.1, that should be helpful, can you please point me to the
> > > related jira(s) and/or docs?
> > >
> > > Thank you,
> > > Rajani
> > >
> > >
> > >
> > > On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev 
> > wrote:
> > >
> > > > Hello Rajani.
> > > > Which version are you running? IIRC 9.1.2 has some
> > > > improvement about caching short queries.
> > > >
> > > > On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski 
> > > > wrote:
> > > >
> > > > > Hi Solr Users,
> > > > >
> > > > > What are your suggestions to improve star queries latencies? By
> star
> > > > > queries I mean "*:*" or single term queries having boost formulas
> > > (such
> > > > as
> > > > > doc recency and many others) taking 10 or more seconds. It is a
> large
> > > > > collection with good compute resources, however I am guessing this
> > may
> > > be
> > > > > because each shard has too many documents and I noticed per shard
> > > > response
> > > > > time also is high.
> > > > >
> > > > > Splitting shards could be an option however it is already an
> > > > > evenly distributed, composite router, 96 shards collection, I am
> > > > > concerned that more than 100 shards per collection can lead to
> > > > exhaustively
> > > > > searching too many shards and aggregation issues. What are your
> > > thoughts?
> > > > >
> > > > > Can we make use of any caches, query result cache or other caches,
> in
> > > > solr
> > > > > that allows warming up and persisting these queries results in ram,
> > and
> > > > > that maybe helps reduce this query time?
> > > > >
> > > > > Thanks,
> > > > > Rajani
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > > https://t.me/MUST_SEARCH
> > > > A caveat: Cyrillic!
> > > >
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>


Re: Suggestions to improve Star queries latencies

2023-04-19 Thread Joel Bernstein
To send the query to a single shard you can add the parameter
"distrib=false" to the query and it will stay on that shard.


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Apr 19, 2023 at 5:21 PM Joel Bernstein  wrote:

> You're hunting for a bottleneck. Here is how I would go about finding it:
>
> First I would run the query on a single shard and see how long it takes.
> If the single shard is slow you've found your bottleneck. If its fast then
> try the same query on each shard, one of the shards might be slow and
> you've found your bottleneck.
>
> If all the shards are fast then it would seem the bottleneck is the
> aggregator node.
>
> Once you've found the bottleneck then you need to start improving the
> throughput. Let us know what you find and then we can move on to discuss
> how to improve the throughput at the bottleneck.
>
> If its very fast thats
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Apr 19, 2023 at 3:57 PM Rajani Maski 
> wrote:
>
>> Thank you, Mikhail.
>>
>>
>> On Wed, Apr 19, 2023 at 7:59 AM Mikhail Khludnev  wrote:
>>
>> > Hello, Rajani.
>> > I meant [SOLR-14765] optimize DocList creation by skipping sort for
>> > sort-irrelevant cases - ASF JIRA (apache.org)
>> > 
>> >
>> > On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski 
>> > wrote:
>> >
>> > > Hi Mikhail,
>> > >
>> > >Yes, 9.1.1, that should be helpful, can you please point me to the
>> > > related jira(s) and/or docs?
>> > >
>> > > Thank you,
>> > > Rajani
>> > >
>> > >
>> > >
>> > > On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev 
>> > wrote:
>> > >
>> > > > Hello Rajani.
>> > > > Which version are you running? IIRC 9.1.2 has some
>> > > > improvement about caching short queries.
>> > > >
>> > > > On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski > >
>> > > > wrote:
>> > > >
>> > > > > Hi Solr Users,
>> > > > >
>> > > > > What are your suggestions to improve star queries latencies? By
>> star
>> > > > > queries I mean "*:*" or single term queries having boost formulas
>> > > (such
>> > > > as
>> > > > > doc recency and many others) taking 10 or more seconds. It is a
>> large
>> > > > > collection with good compute resources, however I am guessing this
>> > may
>> > > be
>> > > > > because each shard has too many documents and I noticed per shard
>> > > > response
>> > > > > time also is high.
>> > > > >
>> > > > > Splitting shards could be an option however it is already an
>> > > > > evenly distributed, composite router, 96 shards collection, I am
>> > > > > concerned that more than 100 shards per collection can lead to
>> > > > exhaustively
>> > > > > searching too many shards and aggregation issues. What are your
>> > > thoughts?
>> > > > >
>> > > > > Can we make use of any caches, query result cache or other
>> caches, in
>> > > > solr
>> > > > > that allows warming up and persisting these queries results in
>> ram,
>> > and
>> > > > > that maybe helps reduce this query time?
>> > > > >
>> > > > > Thanks,
>> > > > > Rajani
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Sincerely yours
>> > > > Mikhail Khludnev
>> > > > https://t.me/MUST_SEARCH
>> > > > A caveat: Cyrillic!
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > https://t.me/MUST_SEARCH
>> > A caveat: Cyrillic!
>> >
>>
>


Re: Debug time spent in aggregating the search results

2023-04-19 Thread Chris Hostetter
: Hi Solr Users,
: 
: Is there a metric endpoint or a debug/explain type query param that
: returns average time spent in aggregating the search results from shards?

Sort of?

Metrics like "QUERY./select.distrib.requestTimes" tell you the stats on 
handling a "distributed" request -- which is when a core is responsible to 
sending out "per-shard" requests and merging the responses.

But it doesn't *only* include the "time spent in aggregating the search 
results from shards" ... it also includes the time spent determining which 
requests to send to which shards, and waiting for the responses to those 
(frequently concurrent) requests"


-Hoss
http://www.lucidworks.com/