jpountz commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2058399082
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1286,14 +1298,11 @@ public long cost() {
@Override
public int numLevels() {
- return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ?
1 : 2;
+ return level1LastDocID == NO_MORE_DOCS ? 1 : 2;
}
@Override
public int getDocIdUpTo(int level) {
- if (indexHasFreq == false) {
- return NO_MORE_DOCS;
- }
Review Comment:
Likewise, let's keep this?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1286,14 +1298,11 @@ public long cost() {
@Override
public int numLevels() {
- return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ?
1 : 2;
+ return level1LastDocID == NO_MORE_DOCS ? 1 : 2;
Review Comment:
I would keep it the way it was for now. Exposing a single level that covers
the whole doc ID space may cause inefficiencies with some scorers, but we
should fix these scorers instead of tuning this class. If all docs have the
same score, it should be just fine to return a single level of impact that
covers the whole doc ID space.
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -282,6 +290,10 @@ public PostingsEnum postings(
@Override
public ImpactsEnum impacts(FieldInfo fieldInfo, BlockTermState state, int
flags)
throws IOException {
+ if (state.docFreq <= BLOCK_SIZE) {
+ // no skip data
+ return new SlowImpactsEnum(postings(fieldInfo, state, null, flags));
+ }
Review Comment:
Let's not use `SlowImpactsEnum` again or it would cancel the speedup from
https://github.com/apache/lucene/pull/14017 (annotation HJ at
https://benchmarks.mikemccandless.com/OrHighHigh.html).
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -66,13 +67,20 @@
public final class Lucene103PostingsReader extends PostingsReaderBase {
static final VectorizationProvider VECTORIZATION_PROVIDER =
VectorizationProvider.getInstance();
+
// Dummy impacts, composed of the maximum possible term frequency and the
lowest possible
// (unsigned) norm value. This is typically used on tail blocks, which don't
actually record
- // impacts as the storage overhead would not be worth any query evaluation
speedup, since there's
+ // impacts as the storage overhead would not be worth any query evaluation
speedup, since
+ // there's
// less than 128 docs left to evaluate anyway.
private static final List<Impact> DUMMY_IMPACTS =
Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L));
+ // We stopped storing a placeholder impact with freq=1 for fields with
IndexOptions.DOCS after
+ // 9.12.0
+ private static final List<Impact> NON_COMPETITIVE_IMPACTS =
+ Collections.singletonList(new Impact(1, 1L));
Review Comment:
Maybe call it `DUMMY_IMPACTS_NO_FREQS` or something like that to draw a
parallel with `DUMMY_IMPACTS` above?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##########
@@ -1309,8 +1318,9 @@ public List<Impact> getImpacts(int level) {
if (level == 1) {
return readImpacts(level1SerializedImpacts, level1Impacts);
}
+ return DUMMY_IMPACTS;
}
- return DUMMY_IMPACTS;
+ return NON_COMPETITIVE_IMPACTS;
Review Comment:
I would keep it the way it was before and just add this at the beginning of
this method (similar to `getDocIdUpTo`):
```java
if (indexHasFreq == false) {
return DUMMY_IMPACTS_NO_FREQS;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]