edge n gram tokenizer could be useful . that would shift query time compute to 
index time at the cost of bigger index size. 

Sent from my iPhone

> On Nov 1, 2023, at 7:02 PM, rajani m <rajinima...@gmail.com> wrote:
> 
> Sorry, it took too long to get back to this one.
> 
> The search query "http://host:8983/solr/v9/select?&q=*&rows=10"; consistently
> took ~500 ms.  With "distrib=false" all the 96 shards have QTime 0-25 ms.
> Does this mean aggregation of results from all the shards is taking ~475
> ms? I also tried shards.rows=5 and it still returned in ~475 ms query time.
> I am assuming the sort for a star query is by score, is that adding to such
> high latency? Why would aggregation take so long? When I do "debug=true"
> 
> @Michael Gibney, could you please provide me with an example query to test
> the improvement implemented as part of SOLR-14765
> <https://issues.apache.org/jira/browse/SOLR-14765>
> 
> @Joel thank you for that tip, the bottleneck seems to be the aggregator and
> query matching a large set of documents or "*" itself.
> 
> The memory(ram) on the nodes is the same as index size so it is not a
> memory/cpu/resource issue and the heap is set to 25% of ram.  A query "
> q=*&fl=id" also has ~500ms latency. An edismax query "qf" "keywords, title,
> description" matching a large set of documents is taking ~2-3 seconds. Any
> "boost" applied to it is adding 2 more seconds. Not sure if it is shard
> size that is the problem, there are ~5m docs and a 60 gb index size per
> shard, though the ram on the node is 128gb.
> 
> Appreciate any suggestions for optimizing the queries latency.
> 
> 
> 
> On Thu, Apr 20, 2023 at 7:30 AM Michael Gibney <mich...@michaelgibney.net>
> wrote:
> 
>>> It is a query with popularity and recency boosts, requesting the first
>> 100
>>> docs with 3 fields per doc.
>> 
>> It sounds like you are scoring/sorting, so the optimization that
>> Mikhail mentioned would not apply (your use-case is not
>> "sort-irrelevant"). Can you share more about specifically how your
>> implementing/invoking your popularity/recency boosts, and how you're
>> applying the "with three fields per doc" requirement?
>> 
>> On Wed, Apr 19, 2023 at 5:23 PM Joel Bernstein <joels...@gmail.com> wrote:
>>> 
>>> To send the query to a single shard you can add the parameter
>>> "distrib=false" to the query and it will stay on that shard.
>>> 
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>> 
>>> 
>>> On Wed, Apr 19, 2023 at 5:21 PM Joel Bernstein <joels...@gmail.com>
>> wrote:
>>> 
>>>> You're hunting for a bottleneck. Here is how I would go about finding
>> it:
>>>> 
>>>> First I would run the query on a single shard and see how long it
>> takes.
>>>> If the single shard is slow you've found your bottleneck. If its fast
>> then
>>>> try the same query on each shard, one of the shards might be slow and
>>>> you've found your bottleneck.
>>>> 
>>>> If all the shards are fast then it would seem the bottleneck is the
>>>> aggregator node.
>>>> 
>>>> Once you've found the bottleneck then you need to start improving the
>>>> throughput. Let us know what you find and then we can move on to
>> discuss
>>>> how to improve the throughput at the bottleneck.
>>>> 
>>>> If its very fast thats
>>>> 
>>>> 
>>>> 
>>>> Joel Bernstein
>>>> http://joelsolr.blogspot.com/
>>>> 
>>>> 
>>>> On Wed, Apr 19, 2023 at 3:57 PM Rajani Maski <rajinima...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Thank you, Mikhail.
>>>>> 
>>>>> 
>>>>> On Wed, Apr 19, 2023 at 7:59 AM Mikhail Khludnev <m...@apache.org>
>> wrote:
>>>>> 
>>>>>> Hello, Rajani.
>>>>>> I meant [SOLR-14765] optimize DocList creation by skipping sort for
>>>>>> sort-irrelevant cases - ASF JIRA (apache.org)
>>>>>> <https://issues.apache.org/jira/browse/SOLR-14765>
>>>>>> 
>>>>>> On Wed, Apr 19, 2023 at 4:05 AM Rajani Maski <rajinima...@gmail.com
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Mikhail,
>>>>>>> 
>>>>>>>   Yes, 9.1.1, that should be helpful, can you please point me to
>> the
>>>>>>> related jira(s) and/or docs?
>>>>>>> 
>>>>>>> Thank you,
>>>>>>> Rajani
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Apr 17, 2023 at 2:09 AM Mikhail Khludnev <m...@apache.org
>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello Rajani.
>>>>>>>> Which version are you running? IIRC 9.1.2 has some
>>>>>>>> improvement about caching short queries.
>>>>>>>> 
>>>>>>>> On Sun, Apr 16, 2023 at 4:25 PM Rajani Maski <
>> rajinima...@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Solr Users,
>>>>>>>>> 
>>>>>>>>> What are your suggestions to improve star queries latencies?
>> By
>>>>> star
>>>>>>>>> queries I mean "*:*" or single term queries having boost
>> formulas
>>>>>>> (such
>>>>>>>> as
>>>>>>>>> doc recency and many others) taking 10 or more seconds. It is
>> a
>>>>> large
>>>>>>>>> collection with good compute resources, however I am guessing
>> this
>>>>>> may
>>>>>>> be
>>>>>>>>> because each shard has too many documents and I noticed per
>> shard
>>>>>>>> response
>>>>>>>>> time also is high.
>>>>>>>>> 
>>>>>>>>> Splitting shards could be an option however it is already an
>>>>>>>>> evenly distributed, composite router, 96 shards collection, I
>> am
>>>>>>>>> concerned that more than 100 shards per collection can lead to
>>>>>>>> exhaustively
>>>>>>>>> searching too many shards and aggregation issues. What are
>> your
>>>>>>> thoughts?
>>>>>>>>> 
>>>>>>>>> Can we make use of any caches, query result cache or other
>>>>> caches, in
>>>>>>>> solr
>>>>>>>>> that allows warming up and persisting these queries results in
>>>>> ram,
>>>>>> and
>>>>>>>>> that maybe helps reduce this query time?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Rajani
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Sincerely yours
>>>>>>>> Mikhail Khludnev
>>>>>>>> https://t.me/MUST_SEARCH
>>>>>>>> A caveat: Cyrillic!
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Sincerely yours
>>>>>> Mikhail Khludnev
>>>>>> https://t.me/MUST_SEARCH
>>>>>> A caveat: Cyrillic!
>>>>>> 
>>>>> 
>>>> 
>> 

Reply via email to