gf2121 commented on code in PR #14679:
URL: https://github.com/apache/lucene/pull/14679#discussion_r2099522207
##########
lucene/core/src/java/org/apache/lucene/search/Scorer.java:
##########
@@ -76,4 +77,57 @@ public int advanceShallow(int target) throws IOException {
* {@link #advanceShallow(int) shallow-advanced} to included and {@code
upTo} included.
*/
public abstract float getMaxScore(int upTo) throws IOException;
+
+ /**
+ * Return a new batch of doc IDs and scores, starting at the current doc ID,
and ending before
+ * {@code upTo}. Because it starts on the current doc ID, it is illegal to
call this method if the
+ * {@link #docID() current doc ID} is {@code -1}.
+ *
+ * <p>An empty return value indicates that there are no postings left
between the current doc ID
+ * and {@code upTo}.
+ *
+ * <p>Implementations should ideally fill the buffer with a number of
entries comprised between 8
+ * and a couple hundreds, to keep heap requirements contained, while still
being large enough to
+ * enable operations on the buffer to auto-vectorize efficiently.
+ *
+ * <p>The default implementation is provided below:
+ *
+ * <pre class="prettyprint">
+ * int batchSize = 16; // arbitrary
+ * buffer.growNoCopy(batchSize);
+ * int size = 0;
+ * DocIdSetIterator iterator = iterator();
+ * for (int doc = docID(); doc < upTo && size < batchSize; doc
= iterator.nextDoc()) {
+ * if (liveDocs == null || liveDocs.get(doc)) {
+ * buffer.docs[size] = doc;
+ * buffer.scores[size] = score();
+ * ++size;
+ * }
+ * }
+ * buffer.size = size;
+ * </pre>
+ *
+ * <p><b>NOTE</b>: The provided {@link DocAndScoreBuffer} should not hold
references to internal
+ * data structures.
+ *
+ * <p><b>NOTE</b>: In case this {@link Scorer} exposes a {@link
#twoPhaseIterator()
+ * TwoPhaseIterator}, it should be positioned on a matching document before
this method is called.
+ *
+ * @lucene.internal
+ */
+ public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer
buffer)
+ throws IOException {
+ int batchSize = 16; // arbitrary
+ buffer.growNoCopy(batchSize);
+ int size = 0;
+ DocIdSetIterator iterator = iterator();
Review Comment:
We have many implementations returning a new iterator here (like
`TwoPhaseIterator.asDocIdSetIterator`), will the object construction for each
16 docs cause noticeable overhead?
##########
lucene/core/src/java/org/apache/lucene/search/TermScorer.java:
##########
@@ -120,4 +126,50 @@ public void setMinCompetitiveScore(float minScore) {
impactsDisi.setMinCompetitiveScore(minScore);
}
}
+
+ @Override
+ public void nextDocsAndScores(int upTo, Bits liveDocs, DocAndScoreBuffer
buffer)
+ throws IOException {
+ if (docAndFreqBuffer == null) {
+ docAndFreqBuffer = new DocAndFreqBuffer();
+ }
+
+ for (; ; ) {
+ postingsEnum.nextPostings(upTo, docAndFreqBuffer);
+ if (liveDocs != null && docAndFreqBuffer.size != 0) {
+ // An empty return value indicates that there are no more docs before
upTo. We may be
+ // unlucky, and there are docs left, but all docs from the current
batch happen to be marked
+ // as deleted. So we need to iterate until we find a batch that has at
least one non-deleted
+ // doc.
+ docAndFreqBuffer.apply(liveDocs);
+ if (docAndFreqBuffer.size == 0) {
+ continue;
+ }
+ }
+ break;
+ }
+
+ int size = docAndFreqBuffer.size;
+ normValues = ArrayUtil.growNoCopy(normValues, size);
+ if (norms == null) {
+ Arrays.fill(normValues, 0, size, 1L);
Review Comment:
Can we only do this fill when grow happens?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1034,6 +1035,50 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, int
offset) throws IOExcept
}
}
+ @Override
+ public void nextPostings(int upTo, DocAndFreqBuffer buffer) throws
IOException {
+ assert needsRefilling == false;
+
+ if (needsFreq == false) {
+ super.nextPostings(upTo, buffer);
+ return;
+ }
+
+ buffer.size = 0;
+ if (doc >= upTo) {
+ return;
+ }
+
+ // Only return docs from the current block
+ buffer.growNoCopy(BLOCK_SIZE);
+ upTo = (int) Math.min(upTo, level0LastDocID + 1L);
+
+ // Frequencies are decoded lazily, calling freq() makes sure that the
freq block is decoded
+ freq();
+
+ int start = docBufferUpto - 1;
+ buffer.size = 0;
Review Comment:
Nit: `buffer.size` has be set to 0 above (line 1047), can we avoid this one?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]