JiaBaoGao created SOLR-17670: -------------------------------- Summary: Fix unnecessary memory allocation caused by a large reRankDocs param Key: SOLR-17670 URL: https://issues.apache.org/jira/browse/SOLR-17670 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: JiaBaoGao
The reRank function has a reRankDocs parameter that specifies the number of documents to re-rank. I've observed that increasing this parameter to test its performance impact causes queries to become progressively slower. Even when the parameter value exceeds the total number of documents in the index, further increases continue to slow down the query, which is counterintuitive. Therefore, I investigated the code: For a query containing re-ranking, such as: {code:java} { "start": "0", "rows": 10, "fl": "ID,score", "q": "*:*", "rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=1000000000 reRankWeight=2}" } {code} The current execution logic is as follows: 1. Perform normal retrieval using the q parameter. 2. Re-score all documents retrieved in the q phase using the rq parameter. During the retrieval in phase 1 (using q), a TopScoreDocCollector is created. Underneath, this creates a PriorityQueue which contains an Object[]. The length of this Object[] continuously increases with reRankDocs without any limit. On my local test cluster with limited JVM memory, this can even trigger an OOM, causing the Solr node to crash. I can also reproduce the OOM situation using the SolrCloudTestCase unit test. I think limiting the length of the Object[] array using searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this issue. This way, when reRankDocs exceeds maxDoc, memory allocation will not continue to increase indefinitely. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org