[jira] [Commented] (SOLR-17670) Fix unnecessary memory allocation caused by a large reRankDocs param

ASF subversion and git services (Jira) Sat, 22 Feb 2025 10:42:04 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929380#comment-17929380
 ]


ASF subversion and git services commented on SOLR-17670:
--------------------------------------------------------

Commit 6e2b61e529ad2c8d9068740dffb9cab8f4d9416e in solr's branch 
refs/heads/branch_9_8 from jiabao.gao
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=6e2b61e529a ]

SOLR-17670: Fix unnecessary memory allocation caused by a large reRankDocs 
param (#3181)

(cherry picked from commit 76c09a35dba42913a6bcb281b52b00f87564624a)


> Fix unnecessary memory allocation caused by a large reRankDocs param
> --------------------------------------------------------------------
>
>                 Key: SOLR-17670
>                 URL: https://issues.apache.org/jira/browse/SOLR-17670
>             Project: Solr
>          Issue Type: Bug
>            Reporter: JiaBaoGao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The reRank function has a reRankDocs parameter that specifies the number of 
> documents to re-rank. I've observed that increasing this parameter to test 
> its performance impact causes queries to become progressively slower. Even 
> when the parameter value exceeds the total number of documents in the index, 
> further increases continue to slow down the query, which is counterintuitive.
>  
> Therefore, I investigated the code:
>  
> For a query containing re-ranking, such as:
> {code:java}
> {
> "start": "0",
> "rows": 10,
> "fl": "ID,score",
> "q": "*:*",
> "rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=1000000000 
> reRankWeight=2}"
> } {code}
>  
> The current execution logic is as follows:
> 1. Perform normal retrieval using the q parameter.
> 2. Re-score all documents retrieved in the q phase using the rq parameter.
>  
> During the retrieval in phase 1 (using q), a TopScoreDocCollector is created. 
> Underneath, this creates a PriorityQueue which contains an Object[]. The 
> length of this Object[] continuously increases with reRankDocs without any 
> limit. 
>  
> On my local test cluster with limited JVM memory, this can even trigger an 
> OOM, causing the Solr node to crash. I can also reproduce the OOM situation 
> using the SolrCloudTestCase unit test. 
>  
> I think limiting the length of the Object[] array using 
> searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this 
> issue. This way, when reRankDocs exceeds maxDoc, memory allocation will not 
> continue to increase indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17670) Fix unnecessary memory allocation caused by a large reRankDocs param

Reply via email to