[jira] [Commented] (SOLR-17447) Add support for maxHits

Eric Pugh (Jira) Thu, 12 Sep 2024 03:03:05 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-17447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881255#comment-17881255
 ]


Eric Pugh commented on SOLR-17447:
----------------------------------

if your number one requirement is a latency (which makes sense for your use 
case), shouldn't we just really make sure *timeAllowed* works?   I'm guessing 
that you are aiming for a very low latency, maybe that is 50ms? So the key is 
to make sure that everything returns super quickly.

I *think* you are suggesting that you need to control how long the search on 
each shard takes, and that *timeAllowed* doesn't do that, so you are thinking 
that you only required *maxHits* number of docs per shard so that then you 
respond, do the aggregation, and then get back to the user within a time.   
Would it be better to just make sure the shard returns in a certain time (and 
not worry about how many hits are matched?).   I am thinking maybe more of a 
*timeAllowedByShard* type parameter that is used by the shard to make the 
decision when to return?   I worry that a *maxHits* approach might take longer 
time than you want to accumulate, and that your number one goal is latency, not 
hits..

> Add support for maxHits
> -----------------------
>
>                 Key: SOLR-17447
>                 URL: https://issues.apache.org/jira/browse/SOLR-17447
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SearchComponents - other
>            Reporter: Siju Varghese
>            Priority: Minor
>
> Currently there are 3 mechanisms to control # of hits for a query
>  * Use of the _timeAllowed_ query parameter - Though this does not directly 
> control the number of hits, it has a similar effect with the collector 
> terminating after the specified time budget has exceeded. The primary 
> objective of this switch is to control runaway queries.
>  * Use of {{{}segmentTerminateEarly{}}}{\{ __ }}parameter - This parameter is 
> only applicable for sorted segments where the sort criteria requested matches 
> the sort criteria used in the SortingMergePolicy
>  * Use of cpuAllowed parameter to put upper bound on cpu time for a query.
>  
> I would like to propose a new _maxHits_ parameter. This parameter early 
> terminates the query once it has gone past the provided number of hits per 
> shard.
> For us the  motivation for such a parameter is the following:
> Our search is extremely latency sensitive and the query set is a mix of very 
> high frequency tokens where we favor fast recall and typical search queries 
> where we favor precision at low latency. The former can be thought of as a 
> search as you type use case and we want to ensure that we return the results 
> quickly and just go over enough documents we plan to control via the maxHits 
> parameter.  We can't use a sorted index for our use case because the sort 
> criteria is a ranking function which is based off document features and the 
> user input.
> With the maxHits parameter, it is quite likely that the results returned 
> might not be the most relevant ones, however that is acceptable for us.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17447) Add support for maxHits

Reply via email to