Re: [PR] SOLR-8282: Switch CoreContainer#getSolrHome to return Path instead of String [solr]

2025-02-22 Thread via GitHub


dsmiley commented on PR #3204:
URL: https://github.com/apache/solr/pull/3204#issuecomment-2676562570

   Excellent!  I love how focused this PR is.
   
   Can you please edit the JIRA association to be 
[SOLR-16903](https://issues.apache.org/jira/browse/SOLR-16903) -- PR title & 
CHANGES.txt.  There's already a CHANGES.txt entries for 16903.   Needn't 
specify every single API endpoint / category since by 10.0, I think we hope we 
can basically say, "Changed all Java APIs using File to NIO Path" and not get 
into further details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Closed] (SOLR-8282) Move Java APIs to NIO2 (Path), and ban use of java.io.File

2025-02-22 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed SOLR-8282.
--

> Move Java APIs to NIO2 (Path), and ban use of java.io.File
> --
>
> Key: SOLR-8282
> URL: https://issues.apache.org/jira/browse/SOLR-8282
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is an umbrella issue for removing all usage of java.io.File from Solr, 
> and replacing it with NIO2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Feature/solr-17334 Minor bugs in Solr dedicated coordinator mode [solr]

2025-02-22 Thread via GitHub


github-actions[bot] commented on PR #2672:
URL: https://github.com/apache/solr/pull/2672#issuecomment-2676457403

   This PR has had no activity for 60 days and is now labeled as stale.  Any 
new activity will remove the stale label.  To attract more reviewers, please 
tag people who might be familiar with the code area and/or notify the 
d...@solr.apache.org mailing list. To exempt this PR from being marked as 
stale, make it a draft PR or add the label "exempt-stale". If left unattended, 
this PR will be closed after another 60 days of inactivity. Thank you for your 
contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Coordinator: remove SolrQueryRequest.getCloudDescriptor [solr]

2025-02-22 Thread via GitHub


github-actions[bot] commented on PR #2913:
URL: https://github.com/apache/solr/pull/2913#issuecomment-2676457368

   This PR has had no activity for 60 days and is now labeled as stale.  Any 
new activity will remove the stale label.  To attract more reviewers, please 
tag people who might be familiar with the code area and/or notify the 
d...@solr.apache.org mailing list. To exempt this PR from being marked as 
stale, make it a draft PR or add the label "exempt-stale". If left unattended, 
this PR will be closed after another 60 days of inactivity. Thank you for your 
contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17518: Deprecate UpdateRequest.getXml() and replace it with XMLRequestWriter [solr]

2025-02-22 Thread via GitHub


dsmiley commented on code in PR #3200:
URL: https://github.com/apache/solr/pull/3200#discussion_r1966671050


##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/XMLRequestWriter.java:
##
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.client.solrj.impl;
+
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.OutputStreamWriter;
+import java.io.Writer;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.request.RequestWriter;
+import org.apache.solr.client.solrj.request.UpdateRequest;
+import org.apache.solr.client.solrj.util.ClientUtils;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.XML;
+
+public class XMLRequestWriter extends RequestWriter {
+
+  /**
+   * Use this to do a push writing instead of pull. If this method returns 
null {@link
+   * 
org.apache.solr.client.solrj.request.RequestWriter#getContentStreams(SolrRequest)}
 is invoked
+   * to do a pull write.
+   */
+  @Override
+  public RequestWriter.ContentWriter getContentWriter(SolrRequest req) {
+if (req instanceof UpdateRequest updateRequest) {
+  if (isEmpty(updateRequest)) return null;
+  return new RequestWriter.ContentWriter() {
+@Override
+public void write(OutputStream os) throws IOException {
+  OutputStreamWriter writer = new OutputStreamWriter(os, 
StandardCharsets.UTF_8);
+  writeXML(updateRequest, writer);
+  writer.flush();
+}
+
+@Override
+public String getContentType() {
+  return ClientUtils.TEXT_XML;
+}
+  };
+}
+return req.getContentWriter(ClientUtils.TEXT_XML);
+  }
+
+  @Override
+  public Collection getContentStreams(SolrRequest req) 
throws IOException {
+if (req instanceof UpdateRequest) {
+  return null;
+}
+return req.getContentStreams();
+  }
+
+  @Override
+  public void write(SolrRequest request, OutputStream os) throws 
IOException {
+if (request instanceof UpdateRequest updateRequest) {
+  BufferedWriter writer =
+  new BufferedWriter(new OutputStreamWriter(os, 
StandardCharsets.UTF_8));
+  writeXML(updateRequest, writer);
+  writer.flush();
+}
+  }
+
+  @Override
+  public String getUpdateContentType() {
+return ClientUtils.TEXT_XML;
+  }
+
+  public void writeXML(UpdateRequest request, Writer writer) throws 
IOException {
+List>> getDocLists = 
getDocLists(request);
+
+for (Map> docs : getDocLists) {
+
+  if (docs != null && !docs.isEmpty()) {
+Map.Entry> firstDoc =
+docs.entrySet().iterator().next();
+Map map = firstDoc.getValue();
+Integer cw = null;

Review Comment:
   he didn't write this code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17518: Deprecate UpdateRequest.getXml() and replace it with XMLRequestWriter [solr]

2025-02-22 Thread via GitHub


dsmiley commented on code in PR #3200:
URL: https://github.com/apache/solr/pull/3200#discussion_r198902


##
solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java:
##
@@ -368,147 +368,39 @@ public void setDeleteQuery(List deleteQuery) {
   // --
   // --
 
+  /**
+   * @deprecated Method will be removed in Solr 10.0. Use {@link 
XMLRequestWriter} instead.
+   */
+  @Deprecated(since = "9.9")
   @Override
   public Collection getContentStreams() throws IOException {
 return ClientUtils.toContentStreams(getXML(), ClientUtils.TEXT_XML);
   }
 
+  /**
+   * @deprecated Method will be removed in Solr 10.0. Use {@link 
XMLRequestWriter} instead.
+   */
+  @Deprecated(since = "9.9")
   public String getXML() throws IOException {
 StringWriter writer = new StringWriter();
 writeXML(writer);
 writer.flush();
 
 // If action is COMMIT or OPTIMIZE, it is sent with params
 String xml = writer.toString();
-// System.out.println( "SEND:"+xml );
 return (xml.length() > 0) ? xml : null;
   }
 
-  private List>> getDocLists(
-  Map> documents) {
-List>> docLists = new 
ArrayList<>();
-Map> docList = null;
-if (this.documents != null) {
-
-  Boolean lastOverwrite = true;
-  Integer lastCommitWithin = -1;
-
-  Set>> entries = 
this.documents.entrySet();
-  for (Entry> entry : entries) {
-Map map = entry.getValue();
-Boolean overwrite = null;
-Integer commitWithin = null;
-if (map != null) {
-  overwrite = (Boolean) entry.getValue().get(OVERWRITE);
-  commitWithin = (Integer) entry.getValue().get(COMMIT_WITHIN);
-}
-if (!Objects.equals(overwrite, lastOverwrite)
-|| !Objects.equals(commitWithin, lastCommitWithin)
-|| docLists.isEmpty()) {
-  docList = new LinkedHashMap<>();
-  docLists.add(docList);
-}
-docList.put(entry.getKey(), entry.getValue());
-lastCommitWithin = commitWithin;
-lastOverwrite = overwrite;
-  }
-}
-
-if (docIterator != null) {
-  docList = new LinkedHashMap<>();
-  docLists.add(docList);
-  while (docIterator.hasNext()) {
-SolrInputDocument doc = docIterator.next();
-if (doc != null) {
-  docList.put(doc, null);
-}
-  }
-}
-
-return docLists;
-  }
-
   /**
-   * @since solr 1.4
+   * @deprecated Method will be removed in Solr 10.0. Use {@link 
XMLRequestWriter} instead.
*/
+  @Deprecated(since = "9.9")
   public UpdateRequest writeXML(Writer writer) throws IOException {
-List>> getDocLists = 
getDocLists(documents);
-
-for (Map> docs : getDocLists) {
-
-  if ((docs != null && docs.size() > 0)) {
-Entry> firstDoc = 
docs.entrySet().iterator().next();
-Map map = firstDoc.getValue();
-Integer cw = null;
-Boolean ow = null;
-if (map != null) {
-  cw = (Integer) firstDoc.getValue().get(COMMIT_WITHIN);
-  ow = (Boolean) firstDoc.getValue().get(OVERWRITE);
-}
-if (ow == null) ow = true;
-int commitWithin = (cw != null && cw != -1) ? cw : this.commitWithin;
-boolean overwrite = ow;
-if (commitWithin > -1 || overwrite != true) {
-  writer.write(
-  "");
-} else {
-  writer.write("");
-}
-
-Set>> entries = 
docs.entrySet();
-for (Entry> entry : entries) {
-  ClientUtils.writeXML(entry.getKey(), writer);
-}
-
-writer.write("");
-  }
-}
-
-// Add the delete commands
-boolean deleteI = deleteById != null && deleteById.size() > 0;
-boolean deleteQ = deleteQuery != null && deleteQuery.size() > 0;
-if (deleteI || deleteQ) {
-  if (commitWithin > 0) {
-writer.append("");
-  } else {
-writer.append("");
-  }
-  if (deleteI) {
-for (Map.Entry> entry : 
deleteById.entrySet()) {
-  writer.append(" map = entry.getValue();
-  if (map != null) {
-Long version = (Long) map.get(VER);
-String route = (String) map.get(_ROUTE_);
-if (version != null) {
-  writer.append(" 
version=\"").append(String.valueOf(version)).append('"');
-}
-
-if (route != null) {
-  writer.append(" _route_=\"").append(route).append('"');
-}
-  }
-  writer.append(">");
-
-  XML.escapeCharData(entry.getKey(), writer);
-  writer.append("");
-}
-  }
-  if (deleteQ) {
-for (String q : deleteQuery) {
-  writer.append("");
-  XML.escapeCharData(q, writer);
-  writer.append("");
-}
-  }
-  writer.append("");
-}
+XMLRequestWriter requestWriter = new XMLR

[jira] [Commented] (SOLR-17670) Fix unnecessary memory allocation caused by a large reRankDocs param

2025-02-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929379#comment-17929379
 ] 

ASF subversion and git services commented on SOLR-17670:


Commit 17f12dd7462691933f13285f93a2b5c444a3 in solr's branch 
refs/heads/branch_9x from jiabao.gao
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=17f12dd ]

SOLR-17670: Fix unnecessary memory allocation caused by a large reRankDocs 
param (#3181)

(cherry picked from commit 76c09a35dba42913a6bcb281b52b00f87564624a)


> Fix unnecessary memory allocation caused by a large reRankDocs param
> 
>
> Key: SOLR-17670
> URL: https://issues.apache.org/jira/browse/SOLR-17670
> Project: Solr
>  Issue Type: Bug
>Reporter: JiaBaoGao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The reRank function has a reRankDocs parameter that specifies the number of 
> documents to re-rank. I've observed that increasing this parameter to test 
> its performance impact causes queries to become progressively slower. Even 
> when the parameter value exceeds the total number of documents in the index, 
> further increases continue to slow down the query, which is counterintuitive.
>  
> Therefore, I investigated the code:
>  
> For a query containing re-ranking, such as:
> {code:java}
> {
> "start": "0",
> "rows": 10,
> "fl": "ID,score",
> "q": "*:*",
> "rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=10 
> reRankWeight=2}"
> } {code}
>  
> The current execution logic is as follows:
> 1. Perform normal retrieval using the q parameter.
> 2. Re-score all documents retrieved in the q phase using the rq parameter.
>  
> During the retrieval in phase 1 (using q), a TopScoreDocCollector is created. 
> Underneath, this creates a PriorityQueue which contains an Object[]. The 
> length of this Object[] continuously increases with reRankDocs without any 
> limit. 
>  
> On my local test cluster with limited JVM memory, this can even trigger an 
> OOM, causing the Solr node to crash. I can also reproduce the OOM situation 
> using the SolrCloudTestCase unit test. 
>  
> I think limiting the length of the Object[] array using 
> searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this 
> issue. This way, when reRankDocs exceeds maxDoc, memory allocation will not 
> continue to increase indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17670: Fix unnecessary memory allocation caused by a large reRankDocs param [solr]

2025-02-22 Thread via GitHub


gaojiabao1991 commented on PR #3181:
URL: https://github.com/apache/solr/pull/3181#issuecomment-2676365101

   
   
   
   > I'll be merging this nice fix into 9.8.1
   
   Thanks so much, David!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17677: Ensure DBQ is safe before running [solr]

2025-02-22 Thread via GitHub


dsmiley commented on PR #3203:
URL: https://github.com/apache/solr/pull/3203#issuecomment-2676413619

   It is trivial to ensure HashRangeQuery doesn't NEED SolrIndexSearcher: #3206


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] SOLR-17677: HashRangeQuery doesn't NEED SolrIndexSearcher [solr]

2025-02-22 Thread via GitHub


dsmiley opened a new pull request, #3206:
URL: https://github.com/apache/solr/pull/3206

   https://issues.apache.org/jira/browse/SOLR-17677
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17631: Upgrade main to Lucene 10.x [solr]

2025-02-22 Thread via GitHub


janhoy commented on PR #3053:
URL: https://github.com/apache/solr/pull/3053#issuecomment-2676269144

   @chatman review feedback waiting :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17677: HashRangeQuery doesn't NEED SolrIndexSearcher [solr]

2025-02-22 Thread via GitHub


dsmiley commented on PR #3206:
URL: https://github.com/apache/solr/pull/3206#issuecomment-2676422679

   Tests pass (GHA for Crave is still out-of-commission)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17670: Fix unnecessary memory allocation caused by a large reRankDocs param [solr]

2025-02-22 Thread via GitHub


dsmiley commented on PR #3181:
URL: https://github.com/apache/solr/pull/3181#issuecomment-2676315964

   I'll be merging this nice fix into 9.8.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17670) Fix unnecessary memory allocation caused by a large reRankDocs param

2025-02-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929377#comment-17929377
 ] 

ASF subversion and git services commented on SOLR-17670:


Commit 76c09a35dba42913a6bcb281b52b00f87564624a in solr's branch 
refs/heads/main from jiabao.gao
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=76c09a35dba ]

SOLR-17670: Fix unnecessary memory allocation caused by a large reRankDocs 
param (#3181)



> Fix unnecessary memory allocation caused by a large reRankDocs param
> 
>
> Key: SOLR-17670
> URL: https://issues.apache.org/jira/browse/SOLR-17670
> Project: Solr
>  Issue Type: Bug
>Reporter: JiaBaoGao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The reRank function has a reRankDocs parameter that specifies the number of 
> documents to re-rank. I've observed that increasing this parameter to test 
> its performance impact causes queries to become progressively slower. Even 
> when the parameter value exceeds the total number of documents in the index, 
> further increases continue to slow down the query, which is counterintuitive.
>  
> Therefore, I investigated the code:
>  
> For a query containing re-ranking, such as:
> {code:java}
> {
> "start": "0",
> "rows": 10,
> "fl": "ID,score",
> "q": "*:*",
> "rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=10 
> reRankWeight=2}"
> } {code}
>  
> The current execution logic is as follows:
> 1. Perform normal retrieval using the q parameter.
> 2. Re-score all documents retrieved in the q phase using the rq parameter.
>  
> During the retrieval in phase 1 (using q), a TopScoreDocCollector is created. 
> Underneath, this creates a PriorityQueue which contains an Object[]. The 
> length of this Object[] continuously increases with reRankDocs without any 
> limit. 
>  
> On my local test cluster with limited JVM memory, this can even trigger an 
> OOM, causing the Solr node to crash. I can also reproduce the OOM situation 
> using the SolrCloudTestCase unit test. 
>  
> I think limiting the length of the Object[] array using 
> searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this 
> issue. This way, when reRankDocs exceeds maxDoc, memory allocation will not 
> continue to increase indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17670: Fix unnecessary memory allocation caused by a large reRankDocs param [solr]

2025-02-22 Thread via GitHub


dsmiley merged PR #3181:
URL: https://github.com/apache/solr/pull/3181


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17670) Fix unnecessary memory allocation caused by a large reRankDocs param

2025-02-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929380#comment-17929380
 ] 

ASF subversion and git services commented on SOLR-17670:


Commit 6e2b61e529ad2c8d9068740dffb9cab8f4d9416e in solr's branch 
refs/heads/branch_9_8 from jiabao.gao
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=6e2b61e529a ]

SOLR-17670: Fix unnecessary memory allocation caused by a large reRankDocs 
param (#3181)

(cherry picked from commit 76c09a35dba42913a6bcb281b52b00f87564624a)


> Fix unnecessary memory allocation caused by a large reRankDocs param
> 
>
> Key: SOLR-17670
> URL: https://issues.apache.org/jira/browse/SOLR-17670
> Project: Solr
>  Issue Type: Bug
>Reporter: JiaBaoGao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The reRank function has a reRankDocs parameter that specifies the number of 
> documents to re-rank. I've observed that increasing this parameter to test 
> its performance impact causes queries to become progressively slower. Even 
> when the parameter value exceeds the total number of documents in the index, 
> further increases continue to slow down the query, which is counterintuitive.
>  
> Therefore, I investigated the code:
>  
> For a query containing re-ranking, such as:
> {code:java}
> {
> "start": "0",
> "rows": 10,
> "fl": "ID,score",
> "q": "*:*",
> "rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=10 
> reRankWeight=2}"
> } {code}
>  
> The current execution logic is as follows:
> 1. Perform normal retrieval using the q parameter.
> 2. Re-score all documents retrieved in the q phase using the rq parameter.
>  
> During the retrieval in phase 1 (using q), a TopScoreDocCollector is created. 
> Underneath, this creates a PriorityQueue which contains an Object[]. The 
> length of this Object[] continuously increases with reRankDocs without any 
> limit. 
>  
> On my local test cluster with limited JVM memory, this can even trigger an 
> OOM, causing the Solr node to crash. I can also reproduce the OOM situation 
> using the SolrCloudTestCase unit test. 
>  
> I think limiting the length of the Object[] array using 
> searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this 
> issue. This way, when reRankDocs exceeds maxDoc, memory allocation will not 
> continue to increase indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17670) Fix unnecessary memory allocation caused by a large reRankDocs param

2025-02-22 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-17670:

Fix Version/s: 9.8.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for contributing!

> Fix unnecessary memory allocation caused by a large reRankDocs param
> 
>
> Key: SOLR-17670
> URL: https://issues.apache.org/jira/browse/SOLR-17670
> Project: Solr
>  Issue Type: Bug
>Reporter: JiaBaoGao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.8.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The reRank function has a reRankDocs parameter that specifies the number of 
> documents to re-rank. I've observed that increasing this parameter to test 
> its performance impact causes queries to become progressively slower. Even 
> when the parameter value exceeds the total number of documents in the index, 
> further increases continue to slow down the query, which is counterintuitive.
>  
> Therefore, I investigated the code:
>  
> For a query containing re-ranking, such as:
> {code:java}
> {
> "start": "0",
> "rows": 10,
> "fl": "ID,score",
> "q": "*:*",
> "rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=10 
> reRankWeight=2}"
> } {code}
>  
> The current execution logic is as follows:
> 1. Perform normal retrieval using the q parameter.
> 2. Re-score all documents retrieved in the q phase using the rq parameter.
>  
> During the retrieval in phase 1 (using q), a TopScoreDocCollector is created. 
> Underneath, this creates a PriorityQueue which contains an Object[]. The 
> length of this Object[] continuously increases with reRankDocs without any 
> limit. 
>  
> On my local test cluster with limited JVM memory, this can even trigger an 
> OOM, causing the Solr node to crash. I can also reproduce the OOM situation 
> using the SolrCloudTestCase unit test. 
>  
> I think limiting the length of the Object[] array using 
> searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this 
> issue. This way, when reRankDocs exceeds maxDoc, memory allocation will not 
> continue to increase indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org