[jira] [Updated] (SOLR-17447) Add support for maxHitsAllowed

Houston Putman (Jira) Thu, 03 Apr 2025 18:14:47 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-17447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Houston Putman updated SOLR-17447:
----------------------------------
    Description: 
Currently there are 3 mechanisms to control # of hits for a query
 * Use of the _timeAllowed_ query parameter - Though this does not directly 
control the number of hits, it has a similar effect with the collector 
terminating after the specified time budget has exceeded. The primary objective 
of this switch is to control runaway queries.
 * Use of {{{}segmentTerminateEarly{}}}{\{ __ }}parameter - This parameter is 
only applicable for sorted segments where the sort criteria requested matches 
the sort criteria used in the SortingMergePolicy
 * Use of cpuAllowed parameter to put upper bound on cpu time for a query.

 

I would like to propose a new _maxHitsAllowed_ parameter. This parameter early 
terminates the query once it has gone past the provided number of hits per 
shard.

For us the  motivation for such a parameter is the following:

Our search is extremely latency sensitive and the query set is a mix of very 
high frequency tokens where we favor fast recall and typical search queries 
where we favor precision at low latency. The former can be thought of as a 
search as you type use case and we want to ensure that we return the results 
quickly and just go over enough documents we plan to control via the maxHits 
parameter.  We can't use a sorted index for our use case because the sort 
criteria is a ranking function which is based off document features and the 
user input.

With the maxHitsAllowed parameter, it is quite likely that the results returned 
might not be the most relevant ones, however that is acceptable for us.

  was:
Currently there are 3 mechanisms to control # of hits for a query
 * Use of the _timeAllowed_ query parameter - Though this does not directly 
control the number of hits, it has a similar effect with the collector 
terminating after the specified time budget has exceeded. The primary objective 
of this switch is to control runaway queries.
 * Use of {{{}segmentTerminateEarly{}}}{\{ __ }}parameter - This parameter is 
only applicable for sorted segments where the sort criteria requested matches 
the sort criteria used in the SortingMergePolicy
 * Use of cpuAllowed parameter to put upper bound on cpu time for a query.

 

I would like to propose a new _maxHits_ parameter. This parameter early 
terminates the query once it has gone past the provided number of hits per 
shard.

For us the  motivation for such a parameter is the following:

Our search is extremely latency sensitive and the query set is a mix of very 
high frequency tokens where we favor fast recall and typical search queries 
where we favor precision at low latency. The former can be thought of as a 
search as you type use case and we want to ensure that we return the results 
quickly and just go over enough documents we plan to control via the maxHits 
parameter.  We can't use a sorted index for our use case because the sort 
criteria is a ranking function which is based off document features and the 
user input.

With the maxHits parameter, it is quite likely that the results returned might 
not be the most relevant ones, however that is acceptable for us.

        Summary: Add support for maxHitsAllowed  (was: Add support for maxHits)

> Add support for maxHitsAllowed
> ------------------------------
>
>                 Key: SOLR-17447
>                 URL: https://issues.apache.org/jira/browse/SOLR-17447
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>            Reporter: Siju Varghese
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: 
> Add_support_for_maxHits__Max_hits_is_a_hard_value_for_number__of_hits_the_searcher_iterate1.patch
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently there are 3 mechanisms to control # of hits for a query
>  * Use of the _timeAllowed_ query parameter - Though this does not directly 
> control the number of hits, it has a similar effect with the collector 
> terminating after the specified time budget has exceeded. The primary 
> objective of this switch is to control runaway queries.
>  * Use of {{{}segmentTerminateEarly{}}}{\{ __ }}parameter - This parameter is 
> only applicable for sorted segments where the sort criteria requested matches 
> the sort criteria used in the SortingMergePolicy
>  * Use of cpuAllowed parameter to put upper bound on cpu time for a query.
>  
> I would like to propose a new _maxHitsAllowed_ parameter. This parameter 
> early terminates the query once it has gone past the provided number of hits 
> per shard.
> For us the  motivation for such a parameter is the following:
> Our search is extremely latency sensitive and the query set is a mix of very 
> high frequency tokens where we favor fast recall and typical search queries 
> where we favor precision at low latency. The former can be thought of as a 
> search as you type use case and we want to ensure that we return the results 
> quickly and just go over enough documents we plan to control via the maxHits 
> parameter.  We can't use a sorted index for our use case because the sort 
> criteria is a ranking function which is based off document features and the 
> user input.
> With the maxHitsAllowed parameter, it is quite likely that the results 
> returned might not be the most relevant ones, however that is acceptable for 
> us.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17447) Add support for maxHitsAllowed

Reply via email to