jpountz opened a new pull request, #13585:
URL: https://github.com/apache/lucene/pull/13585
This updates the postings format in order to inline skip data into postings.
This format is generally similar to the current `Lucene99PostingsFormat`, e.g.
it shares the same block encoding logic, but it has a few differences:
- Skip data is inlined into postings to make the access pattern more
sequential.
- There are only 2 levels of skip data: on every block (128 docs) and every
32 blocks (4096 docs).
In general, I found that the fact that skip data is inlined may slow down a
bit queries that don't need skip data at all (e.g. `CountOrXXX` tasks that
never advance of consult impacts) and speed up a bit queries that advance by
small intervals. The fact that the greatest level only allows skipping 4096
docs at once means that we're slower at advancing by large intervals, but data
suggests that it doesn't significantly hurt performance. Phrase queries and
term queries sorted by field are slower for reasons that I haven't understood
yet.
These results were produced in wikibigall without inter-segment concurrency.
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
HighTermTitleSort 152.82 (1.3%) 105.67
(0.9%) -30.9% ( -32% - -29%) 0.000
Phrase 11.67 (5.2%) 10.13
(4.2%) -13.2% ( -21% - -4%) 0.000
CountOrHighHigh 56.79 (33.3%) 49.41
(21.1%) -13.0% ( -50% - 62%) 0.141
HighTermMonthSort 3598.70 (3.2%) 3372.04
(2.7%) -6.3% ( -11% - 0%) 0.000
CountOrHighMed 104.44 (21.2%) 99.90
(18.1%) -4.3% ( -36% - 44%) 0.486
Wildcard 54.26 (3.0%) 52.23
(2.6%) -3.7% ( -9% - 1%) 0.000
TermDTSort 349.67 (6.0%) 339.57
(4.3%) -2.9% ( -12% - 7%) 0.081
IntNRQ 113.09 (21.2%) 110.12
(21.6%) -2.6% ( -37% - 51%) 0.699
CountTerm 9104.21 (4.1%) 8870.31
(6.0%) -2.6% ( -12% - 7%) 0.115
Prefix3 296.80 (1.9%) 290.04
(2.0%) -2.3% ( -6% - 1%) 0.000
HighTerm 383.13 (5.2%) 377.50
(7.5%) -1.5% ( -13% - 11%) 0.472
PKLookup 286.07 (1.5%) 281.91
(2.1%) -1.5% ( -4% - 2%) 0.012
HighTermDayOfYearSort 758.57 (2.6%) 748.44
(2.9%) -1.3% ( -6% - 4%) 0.121
HighTermTitleBDVSort 13.27 (4.9%) 13.13
(6.2%) -1.1% ( -11% - 10%) 0.546
Fuzzy1 98.52 (1.7%) 97.67
(2.1%) -0.9% ( -4% - 3%) 0.154
AndHighHigh 62.93 (1.9%) 62.46
(1.5%) -0.7% ( -4% - 2%) 0.164
Fuzzy2 62.42 (1.5%) 61.96
(1.9%) -0.7% ( -4% - 2%) 0.184
Respell 49.68 (1.3%) 49.39
(1.5%) -0.6% ( -3% - 2%) 0.171
Or2Terms2StopWords 157.28 (1.7%) 157.04
(1.7%) -0.2% ( -3% - 3%) 0.777
OrHighHigh 72.02 (1.7%) 72.21
(1.8%) 0.3% ( -3% - 3%) 0.642
AndStopWords 29.81 (2.2%) 29.94
(1.7%) 0.4% ( -3% - 4%) 0.495
And2Terms2StopWords 151.81 (1.5%) 152.86
(1.8%) 0.7% ( -2% - 4%) 0.183
OrHighNotLow 384.08 (5.0%) 388.68
(6.9%) 1.2% ( -10% - 13%) 0.531
OrHighNotHigh 210.18 (6.1%) 213.18
(7.3%) 1.4% ( -11% - 15%) 0.502
OrHighNotMed 324.28 (5.3%) 329.41
(6.8%) 1.6% ( -9% - 14%) 0.413
MedTerm 567.00 (5.4%) 578.90
(8.1%) 2.1% ( -10% - 16%) 0.333
CountPhrase 3.24 (10.3%) 3.31
(13.2%) 2.2% ( -19% - 28%) 0.551
LowTerm 854.03 (4.9%) 873.32
(7.2%) 2.3% ( -9% - 15%) 0.248
AndHighMed 197.59 (1.5%) 203.05
(2.2%) 2.8% ( 0% - 6%) 0.000
OrNotHighHigh 178.76 (6.5%) 184.38
(7.5%) 3.1% ( -10% - 18%) 0.156
OrStopWords 32.36 (2.8%) 33.56
(1.7%) 3.7% ( 0% - 8%) 0.000
Or3Terms 158.54 (1.6%) 164.51
(2.1%) 3.8% ( 0% - 7%) 0.000
OrHighMed 231.23 (1.8%) 241.40
(2.9%) 4.4% ( 0% - 9%) 0.000
And3Terms 157.12 (1.3%) 164.32
(1.5%) 4.6% ( 1% - 7%) 0.000
OrHighLow 732.71 (1.6%) 786.67
(3.1%) 7.4% ( 2% - 12%) 0.000
OrNotHighMed 282.64 (6.5%) 306.83
(8.5%) 8.6% ( -6% - 25%) 0.000
OrHighRare 237.87 (7.8%) 259.37
(4.6%) 9.0% ( -3% - 23%) 0.000
OrNotHighLow 833.05 (2.4%) 946.10
(3.8%) 13.6% ( 7% - 20%) 0.000
CountAndHighHigh 41.24 (2.0%) 46.91
(2.7%) 13.8% ( 8% - 18%) 0.000
AndHighLow 748.77 (1.7%) 870.25
(3.1%) 16.2% ( 11% - 21%) 0.000
CountAndHighMed 120.32 (2.0%) 140.26
(3.5%) 16.6% ( 10% - 22%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]