jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246112137
Skip data at level 0 now stores pointers into pos/pay files instead of
incrementing posPendingCount by the total term freq of the block. This seems to
slow down term queries marginally and improve phrase queries a bit. Also I
noticed we would sometimes decode the same block of positions multiple times
when it's shared by two doc blocks (because when moving to the next doc block
we reset the position FP to the start of the pos block and decode them again,
while they were already decoded, it looks like it's an existing issue in
Lucene99 too), but fixing it only yielded a minor speedup.
luceneutil now gives this on wikibigall:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
CountOrHighHigh 54.55 (28.7%) 48.60
(18.8%) -10.9% ( -45% - 51%) 0.155
HighTermMonthSort 3730.94 (3.2%) 3388.64
(2.2%) -9.2% ( -14% - -3%) 0.000
Prefix3 35.43 (5.2%) 34.45
(3.3%) -2.8% ( -10% - 6%) 0.044
CountOrHighMed 102.26 (20.1%) 99.71
(16.3%) -2.5% ( -32% - 42%) 0.666
Wildcard 106.00 (4.9%) 103.94
(3.7%) -1.9% ( -9% - 6%) 0.154
OrHighNotLow 345.58 (4.7%) 339.60
(6.2%) -1.7% ( -12% - 9%) 0.321
HighTermTitleSort 132.39 (6.1%) 130.53
(2.2%) -1.4% ( -9% - 7%) 0.329
AndHighHigh 62.75 (1.7%) 62.14
(1.8%) -1.0% ( -4% - 2%) 0.078
CountTerm 9407.38 (4.8%) 9317.61
(3.1%) -1.0% ( -8% - 7%) 0.454
TermDTSort 342.50 (3.8%) 339.94
(4.1%) -0.7% ( -8% - 7%) 0.548
PKLookup 289.43 (1.8%) 287.33
(2.8%) -0.7% ( -5% - 3%) 0.330
Respell 50.56 (1.8%) 50.26
(2.1%) -0.6% ( -4% - 3%) 0.339
Fuzzy2 70.03 (1.5%) 69.67
(1.5%) -0.5% ( -3% - 2%) 0.271
HighTermDayOfYearSort 801.40 (2.3%) 797.96
(2.5%) -0.4% ( -5% - 4%) 0.570
HighTerm 442.74 (5.8%) 441.35
(6.1%) -0.3% ( -11% - 12%) 0.867
Fuzzy1 74.12 (1.6%) 74.28
(1.6%) 0.2% ( -2% - 3%) 0.665
HighTermTitleBDVSort 14.58 (7.7%) 14.61
(7.2%) 0.3% ( -13% - 16%) 0.914
AndStopWords 29.90 (2.0%) 29.99
(1.8%) 0.3% ( -3% - 4%) 0.626
Or2Terms2StopWords 159.18 (1.4%) 160.08
(1.2%) 0.6% ( -1% - 3%) 0.159
OrHighHigh 66.89 (1.7%) 67.38
(1.6%) 0.7% ( -2% - 4%) 0.155
And2Terms2StopWords 152.71 (1.5%) 154.79
(1.3%) 1.4% ( -1% - 4%) 0.002
OrHighNotMed 322.27 (5.3%) 327.40
(7.3%) 1.6% ( -10% - 15%) 0.432
MedTerm 560.59 (6.6%) 571.19
(7.5%) 1.9% ( -11% - 17%) 0.399
OrHighNotHigh 233.42 (6.3%) 239.76
(7.7%) 2.7% ( -10% - 17%) 0.223
IntNRQ 140.44 (18.3%) 144.57
(18.8%) 2.9% ( -28% - 49%) 0.616
AndHighMed 151.63 (1.5%) 156.15
(1.5%) 3.0% ( 0% - 6%) 0.000
OrStopWords 32.73 (1.9%) 33.84
(1.6%) 3.4% ( 0% - 7%) 0.000
LowTerm 972.10 (5.9%) 1005.36
(6.5%) 3.4% ( -8% - 16%) 0.081
Phrase 11.68 (4.5%) 12.08
(4.1%) 3.4% ( -4% - 12%) 0.012
OrHighMed 199.56 (1.8%) 207.12
(2.1%) 3.8% ( 0% - 7%) 0.000
OrNotHighHigh 214.79 (6.5%) 223.81
(8.2%) 4.2% ( -9% - 20%) 0.073
Or3Terms 159.96 (1.5%) 167.15
(1.4%) 4.5% ( 1% - 7%) 0.000
And3Terms 157.48 (1.6%) 165.21
(1.5%) 4.9% ( 1% - 8%) 0.000
CountPhrase 3.33 (11.8%) 3.56
(13.9%) 6.7% ( -16% - 36%) 0.100
OrHighLow 695.89 (2.2%) 743.44
(2.7%) 6.8% ( 1% - 12%) 0.000
OrHighRare 242.91 (4.0%) 263.09
(4.2%) 8.3% ( 0% - 17%) 0.000
OrNotHighMed 258.59 (6.9%) 285.71
(9.6%) 10.5% ( -5% - 28%) 0.000
CountAndHighHigh 41.68 (2.3%) 47.42
(2.9%) 13.8% ( 8% - 19%) 0.000
AndHighLow 913.56 (2.2%) 1063.18
(2.3%) 16.4% ( 11% - 21%) 0.000
OrNotHighLow 843.74 (1.7%) 982.67
(3.5%) 16.5% ( 11% - 22%) 0.000
CountAndHighMed 121.18 (2.0%) 142.55
(3.1%) 17.6% ( 12% - 23%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]