Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

via GitHub Thu, 24 Apr 2025 06:11:20 -0700


jpountz commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2058399082



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1286,14 +1298,11 @@ public long cost() {
 
           @Override
           public int numLevels() {
-            return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 
1 : 2;
+            return level1LastDocID == NO_MORE_DOCS ? 1 : 2;
           }
 
           @Override
           public int getDocIdUpTo(int level) {
-            if (indexHasFreq == false) {
-              return NO_MORE_DOCS;
-            }

Review Comment:
   Likewise, let's keep this?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1286,14 +1298,11 @@ public long cost() {
 
           @Override
           public int numLevels() {
-            return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 
1 : 2;
+            return level1LastDocID == NO_MORE_DOCS ? 1 : 2;

Review Comment:
   I would keep it the way it was for now. Exposing a single level that covers 
the whole doc ID space may cause inefficiencies with some scorers, but we 
should fix these scorers instead of tuning this class. If all docs have the 
same score, it should be just fine to return a single level of impact that 
covers the whole doc ID space.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -282,6 +290,10 @@ public PostingsEnum postings(
   @Override
   public ImpactsEnum impacts(FieldInfo fieldInfo, BlockTermState state, int 
flags)
       throws IOException {
+    if (state.docFreq <= BLOCK_SIZE) {
+      // no skip data
+      return new SlowImpactsEnum(postings(fieldInfo, state, null, flags));
+    }

Review Comment:
   Let's not use `SlowImpactsEnum` again or it would cancel the speedup from 
https://github.com/apache/lucene/pull/14017 (annotation HJ at 
https://benchmarks.mikemccandless.com/OrHighHigh.html).



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -66,13 +67,20 @@
 public final class Lucene103PostingsReader extends PostingsReaderBase {
 
   static final VectorizationProvider VECTORIZATION_PROVIDER = 
VectorizationProvider.getInstance();
+
   // Dummy impacts, composed of the maximum possible term frequency and the 
lowest possible
   // (unsigned) norm value. This is typically used on tail blocks, which don't 
actually record
-  // impacts as the storage overhead would not be worth any query evaluation 
speedup, since there's
+  // impacts as the storage overhead would not be worth any query evaluation 
speedup, since
+  // there's
   // less than 128 docs left to evaluate anyway.
   private static final List<Impact> DUMMY_IMPACTS =
       Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L));
 
+  // We stopped storing a placeholder impact with freq=1 for fields with 
IndexOptions.DOCS after
+  // 9.12.0
+  private static final List<Impact> NON_COMPETITIVE_IMPACTS =
+      Collections.singletonList(new Impact(1, 1L));

Review Comment:
   Maybe call it `DUMMY_IMPACTS_NO_FREQS` or something like that to draw a 
parallel with `DUMMY_IMPACTS` above?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1309,8 +1318,9 @@ public List<Impact> getImpacts(int level) {
               if (level == 1) {
                 return readImpacts(level1SerializedImpacts, level1Impacts);
               }
+              return DUMMY_IMPACTS;
             }
-            return DUMMY_IMPACTS;
+            return NON_COMPETITIVE_IMPACTS;

Review Comment:
   I would keep it the way it was before and just add this at the beginning of 
this method (similar to `getDocIdUpTo`):
   
   ```java
               if (indexHasFreq == false) {
                 return DUMMY_IMPACTS_NO_FREQS;
               }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

Reply via email to