jpountz opened a new pull request, #13605:
URL: https://github.com/apache/lucene/pull/13605
It's been pointed multiple times that a difference between Tantivy and
Lucene is the fact that Tantivy uses windows of 4,096 docs when Lucene has a 2x
smaller window size of 2,048 docs and that this might explain part of the
performance difference. luceneutil suggests that bumping the window size to
4,096 does indeed improve performance for counting queries, but not for top-k
queries. I'm still suggesting to bump the window size across the board to keep
our disjunction scorers consistent.
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
CountPhrase 3.27 (11.6%) 3.14
(8.0%) -4.1% ( -21% - 17%) 0.189
HighTermMonthSort 3521.28 (3.5%) 3481.74
(2.8%) -1.1% ( -7% - 5%) 0.262
PKLookup 289.42 (1.3%) 286.47
(2.2%) -1.0% ( -4% - 2%) 0.075
TermDTSort 352.01 (6.5%) 348.89
(5.6%) -0.9% ( -12% - 11%) 0.642
Phrase 11.85 (5.3%) 11.76
(5.0%) -0.8% ( -10% - 9%) 0.634
OrHighLow 772.82 (2.4%) 767.24
(2.1%) -0.7% ( -5% - 3%) 0.313
CountAndHighMed 120.78 (2.3%) 120.10
(2.5%) -0.6% ( -5% - 4%) 0.449
HighTermDayOfYearSort 821.48 (3.5%) 818.62
(2.7%) -0.3% ( -6% - 6%) 0.724
HighTermTitleSort 148.84 (2.9%) 148.33
(2.8%) -0.3% ( -5% - 5%) 0.700
AndHighHigh 62.36 (1.7%) 62.17
(1.8%) -0.3% ( -3% - 3%) 0.584
CountAndHighHigh 41.41 (2.5%) 41.34
(2.6%) -0.2% ( -5% - 5%) 0.836
Fuzzy1 96.24 (1.0%) 96.09
(1.2%) -0.2% ( -2% - 2%) 0.667
AndHighLow 827.59 (2.7%) 826.89
(2.4%) -0.1% ( -5% - 5%) 0.918
AndHighMed 93.35 (1.6%) 93.29
(1.7%) -0.1% ( -3% - 3%) 0.903
HighTermTitleBDVSort 16.30 (4.2%) 16.29
(6.7%) -0.0% ( -10% - 11%) 0.984
OrHighMed 153.42 (2.6%) 153.41
(2.2%) -0.0% ( -4% - 4%) 0.994
Respell 46.72 (1.3%) 46.72
(1.4%) 0.0% ( -2% - 2%) 0.975
And3Terms 155.73 (2.2%) 155.95
(1.4%) 0.1% ( -3% - 3%) 0.805
Fuzzy2 58.66 (0.9%) 58.77
(1.1%) 0.2% ( -1% - 2%) 0.566
OrHighHigh 75.70 (2.6%) 75.90
(2.3%) 0.3% ( -4% - 5%) 0.733
CountTerm 9110.00 (4.3%) 9142.10
(3.2%) 0.4% ( -6% - 8%) 0.768
AndStopWords 29.47 (2.6%) 29.57
(1.3%) 0.4% ( -3% - 4%) 0.579
And2Terms2StopWords 150.30 (2.1%) 150.86
(1.1%) 0.4% ( -2% - 3%) 0.487
OrHighRare 237.33 (5.7%) 238.26
(6.2%) 0.4% ( -10% - 13%) 0.837
MedTerm 553.55 (6.0%) 555.97
(7.7%) 0.4% ( -12% - 15%) 0.841
Wildcard 34.08 (3.2%) 34.25
(3.4%) 0.5% ( -5% - 7%) 0.630
OrNotHighLow 761.70 (3.2%) 766.33
(2.6%) 0.6% ( -5% - 6%) 0.511
Or2Terms2StopWords 156.10 (3.2%) 157.14
(1.8%) 0.7% ( -4% - 5%) 0.416
Or3Terms 156.59 (3.0%) 157.70
(1.9%) 0.7% ( -4% - 5%) 0.374
HighTerm 440.27 (5.6%) 443.89
(7.5%) 0.8% ( -11% - 14%) 0.695
LowTerm 892.27 (5.2%) 900.48
(6.8%) 0.9% ( -10% - 13%) 0.632
OrStopWords 31.88 (4.7%) 32.29
(2.6%) 1.3% ( -5% - 9%) 0.276
Prefix3 214.22 (3.4%) 217.48
(2.8%) 1.5% ( -4% - 8%) 0.124
OrHighNotHigh 247.52 (4.8%) 254.52
(5.1%) 2.8% ( -6% - 13%) 0.071
IntNRQ 144.53 (17.2%) 148.66
(17.9%) 2.9% ( -27% - 45%) 0.607
OrNotHighMed 330.23 (6.5%) 340.12
(5.4%) 3.0% ( -8% - 15%) 0.114
OrHighNotMed 285.11 (5.2%) 293.82
(6.2%) 3.1% ( -7% - 15%) 0.092
OrHighNotLow 429.94 (5.4%) 443.15
(6.8%) 3.1% ( -8% - 16%) 0.113
OrNotHighHigh 189.30 (5.9%) 195.25
(5.4%) 3.1% ( -7% - 15%) 0.079
CountOrHighMed 99.90 (22.5%) 121.78
(20.0%) 21.9% ( -16% - 83%) 0.001
CountOrHighHigh 53.76 (35.1%) 70.24
(32.5%) 30.6% ( -27% - 151%) 0.004
```
### Description
<!--
If this is your first contribution to Lucene, please make sure you have
reviewed the contribution guide.
https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]