Re: [PR] Speed up exhaustive evaluation. [lucene]

via GitHub Wed, 21 May 2025 00:27:22 -0700


gf2121 commented on code in PR #14679:
URL: https://github.com/apache/lucene/pull/14679#discussion_r2099522207



##########
lucene/core/src/java/org/apache/lucene/search/Scorer.java:
##########
@@ -76,4 +77,57 @@ public int advanceShallow(int target) throws IOException {
    * {@link #advanceShallow(int) shallow-advanced} to included and {@code 
upTo} included.
    */
   public abstract float getMaxScore(int upTo) throws IOException;
+
+  /**
+   * Return a new batch of doc IDs and scores, starting at the current doc ID, 
and ending before
+   * {@code upTo}. Because it starts on the current doc ID, it is illegal to 
call this method if the
+   * {@link #docID() current doc ID} is {@code -1}.
+   *
+   * <p>An empty return value indicates that there are no postings left 
between the current doc ID
+   * and {@code upTo}.
+   *
+   * <p>Implementations should ideally fill the buffer with a number of 
entries comprised between 8
+   * and a couple hundreds, to keep heap requirements contained, while still 
being large enough to
+   * enable operations on the buffer to auto-vectorize efficiently.
+   *
+   * <p>The default implementation is provided below:
+   *
+   * <pre class="prettyprint">
+   * int batchSize = 16; // arbitrary
+   * buffer.growNoCopy(batchSize);
+   * int size = 0;
+   * DocIdSetIterator iterator = iterator();
+   * for (int doc = docID(); doc &lt; upTo &amp;&amp; size &lt; batchSize; doc 
= iterator.nextDoc()) {
+   *   if (liveDocs == null || liveDocs.get(doc)) {
+   *     buffer.docs[size] = doc;
+   *     buffer.scores[size] = score();
+   *     ++size;
+   *   }
+   * }
+   * buffer.size = size;
+   * </pre>
+   *
+   * <p><b>NOTE</b>: The provided {@link DocAndScoreBuffer} should not hold 
references to internal
+   * data structures.
+   *
+   * <p><b>NOTE</b>: In case this {@link Scorer} exposes a {@link 
#twoPhaseIterator()
+   * TwoPhaseIterator}, it should be positioned on a matching document before 
this method is called.
+   *
+   * @lucene.internal
+   */
+  public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer 
buffer)
+      throws IOException {
+    int batchSize = 16; // arbitrary
+    buffer.growNoCopy(batchSize);
+    int size = 0;
+    DocIdSetIterator iterator = iterator();

Review Comment:
   We have many implementations returning a new iterator here (like 
`TwoPhaseIterator.asDocIdSetIterator`), will the object construction for each 
16 docs cause noticeable overhead?



##########
lucene/core/src/java/org/apache/lucene/search/TermScorer.java:
##########
@@ -120,4 +126,50 @@ public void setMinCompetitiveScore(float minScore) {
       impactsDisi.setMinCompetitiveScore(minScore);
     }
   }
+
+  @Override
+  public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer 
buffer)
+      throws IOException {
+    if (docAndFreqBuffer == null) {
+      docAndFreqBuffer = new DocAndFreqBuffer();
+    }
+
+    for (; ; ) {
+      postingsEnum.nextPostings(upTo, docAndFreqBuffer);
+      if (liveDocs != null && docAndFreqBuffer.size != 0) {
+        // An empty return value indicates that there are no more docs before 
upTo. We may be
+        // unlucky, and there are docs left, but all docs from the current 
batch happen to be marked
+        // as deleted. So we need to iterate until we find a batch that has at 
least one non-deleted
+        // doc.
+        docAndFreqBuffer.apply(liveDocs);
+        if (docAndFreqBuffer.size == 0) {
+          continue;
+        }
+      }
+      break;
+    }
+
+    int size = docAndFreqBuffer.size;
+    normValues = ArrayUtil.growNoCopy(normValues, size);
+    if (norms == null) {
+      Arrays.fill(normValues, 0, size, 1L);

Review Comment:
   Can we only do this fill when grow happens?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1034,6 +1035,50 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, int 
offset) throws IOExcept
       }
     }
 
+    @Override
+    public void nextPostings(int upTo, DocAndFreqBuffer buffer) throws 
IOException {
+      assert needsRefilling == false;
+
+      if (needsFreq == false) {
+        super.nextPostings(upTo, buffer);
+        return;
+      }
+
+      buffer.size = 0;
+      if (doc >= upTo) {
+        return;
+      }
+
+      // Only return docs from the current block
+      buffer.growNoCopy(BLOCK_SIZE);
+      upTo = (int) Math.min(upTo, level0LastDocID + 1L);
+
+      // Frequencies are decoded lazily, calling freq() makes sure that the 
freq block is decoded
+      freq();
+
+      int start = docBufferUpto - 1;
+      buffer.size = 0;

Review Comment:
   Nit: `buffer.size` has be set to 0 above (line 1047), can we avoid this one? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Speed up exhaustive evaluation. [lucene]

Reply via email to