[
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852876#comment-16852876
]
Luca Cavanna edited comment on LUCENE-8796 at 5/31/19 10:08 AM:
----------------------------------------------------------------
I updated the PR and addressed all the comments, here are the latest benchmark
results (with bitset optimization disabled on both ends):
{noformat}
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
MedTerm 1510.74 (6.8%) 1457.20 (8.4%)
-3.5% ( -17% - 12%)
Fuzzy1 70.49 (8.5%) 68.11 (9.8%)
-3.4% ( -19% - 16%)
OrHighNotMed 650.57 (5.8%) 629.81 (6.0%)
-3.2% ( -14% - 9%)
OrHighLow 447.13 (4.2%) 433.05 (4.5%)
-3.2% ( -11% - 5%)
OrNotHighMed 623.22 (6.3%) 605.19 (6.1%)
-2.9% ( -14% - 10%)
OrHighNotLow 720.89 (7.0%) 701.26 (7.9%)
-2.7% ( -16% - 13%)
OrNotHighHigh 558.43 (6.3%) 544.82 (4.9%)
-2.4% ( -12% - 9%)
LowTerm 1279.34 (4.9%) 1248.60 (5.2%)
-2.4% ( -11% - 8%)
AndHighLow 690.75 (4.0%) 675.22 (5.3%)
-2.2% ( -11% - 7%)
LowPhrase 358.90 (2.3%) 351.28 (4.0%)
-2.1% ( -8% - 4%)
PKLookup 139.97 (3.0%) 137.32 (3.5%)
-1.9% ( -8% - 4%)
OrNotHighLow 728.48 (6.8%) 714.79 (6.5%)
-1.9% ( -14% - 12%)
HighTerm 1222.38 (6.3%) 1199.77 (7.1%)
-1.8% ( -14% - 12%)
AndHighHigh 58.93 (6.2%) 58.01 (5.8%)
-1.6% ( -12% - 11%)
Prefix3 152.21 (4.5%) 150.00 (5.0%)
-1.5% ( -10% - 8%)
IntNRQConjMedTerm 79.15 (10.7%) 78.06 (10.5%)
-1.4% ( -20% - 22%)
HighTermDayOfYearSort 95.28 (5.1%) 94.10 (7.8%)
-1.2% ( -13% - 12%)
Wildcard 64.23 (2.3%) 63.45 (2.3%)
-1.2% ( -5% - 3%)
MedSpanNear 81.15 (2.2%) 80.19 (2.8%)
-1.2% ( -6% - 3%)
HighSpanNear 10.20 (3.9%) 10.08 (4.2%)
-1.2% ( -8% - 7%)
HighIntervalsOrdered 4.07 (1.8%) 4.03 (2.2%)
-1.1% ( -4% - 2%)
LowSpanNear 41.62 (3.1%) 41.20 (3.6%)
-1.0% ( -7% - 5%)
IntNRQConjLowTerm 20.36 (4.1%) 20.15 (4.5%)
-1.0% ( -9% - 7%)
IntNRQConjHighTerm 64.84 (9.6%) 64.21 (9.4%)
-1.0% ( -18% - 19%)
AndHighMed 229.08 (2.8%) 227.00 (2.5%)
-0.9% ( -6% - 4%)
MedPhrase 18.73 (1.5%) 18.57 (2.3%)
-0.8% ( -4% - 2%)
LowSloppyPhrase 124.52 (2.3%) 123.48 (2.6%)
-0.8% ( -5% - 4%)
Respell 69.26 (3.0%) 68.68 (2.9%)
-0.8% ( -6% - 5%)
HighPhrase 12.98 (1.6%) 12.88 (2.2%)
-0.7% ( -4% - 3%)
PrefixConjLowTerm 42.11 (2.6%) 41.81 (3.0%)
-0.7% ( -6% - 5%)
OrHighNotHigh 680.34 (6.1%) 676.16 (7.6%)
-0.6% ( -13% - 13%)
MedSloppyPhrase 34.06 (4.9%) 33.89 (4.5%)
-0.5% ( -9% - 9%)
IntNRQ 89.97 (12.4%) 89.62 (12.0%)
-0.4% ( -22% - 27%)
HighSloppyPhrase 8.28 (4.0%) 8.25 (3.9%)
-0.3% ( -7% - 7%)
WildcardConjLowTerm 36.35 (2.7%) 36.26 (2.7%)
-0.3% ( -5% - 5%)
OrHighHigh 27.89 (2.6%) 27.85 (3.1%)
-0.1% ( -5% - 5%)
Fuzzy2 44.19 (3.8%) 44.17 (3.1%)
-0.1% ( -6% - 7%)
OrHighMed 90.42 (2.8%) 90.57 (2.8%)
0.2% ( -5% - 6%)
PrefixConjMedTerm 45.56 (2.8%) 45.79 (2.9%)
0.5% ( -5% - 6%)
WildcardConjHighTerm 33.08 (2.6%) 33.47 (3.0%)
1.2% ( -4% - 6%)
PrefixConjHighTerm 83.65 (2.6%) 86.23 (3.7%)
3.1% ( -3% - 9%)
HighTermMonthSort 130.35 (15.8%) 135.08 (12.1%)
3.6% ( -20% - 37%)
WildcardConjMedTerm 99.19 (3.6%) 103.37 (4.1%)
4.2% ( -3% - 12%)
{noformat}
was (Author: lucacavanna):
I updated the PR and addressed all the comments, here are the latest benchmark
results:
{noformat}
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
MedTerm 1510.74 (6.8%) 1457.20 (8.4%)
-3.5% ( -17% - 12%)
Fuzzy1 70.49 (8.5%) 68.11 (9.8%)
-3.4% ( -19% - 16%)
OrHighNotMed 650.57 (5.8%) 629.81 (6.0%)
-3.2% ( -14% - 9%)
OrHighLow 447.13 (4.2%) 433.05 (4.5%)
-3.2% ( -11% - 5%)
OrNotHighMed 623.22 (6.3%) 605.19 (6.1%)
-2.9% ( -14% - 10%)
OrHighNotLow 720.89 (7.0%) 701.26 (7.9%)
-2.7% ( -16% - 13%)
OrNotHighHigh 558.43 (6.3%) 544.82 (4.9%)
-2.4% ( -12% - 9%)
LowTerm 1279.34 (4.9%) 1248.60 (5.2%)
-2.4% ( -11% - 8%)
AndHighLow 690.75 (4.0%) 675.22 (5.3%)
-2.2% ( -11% - 7%)
LowPhrase 358.90 (2.3%) 351.28 (4.0%)
-2.1% ( -8% - 4%)
PKLookup 139.97 (3.0%) 137.32 (3.5%)
-1.9% ( -8% - 4%)
OrNotHighLow 728.48 (6.8%) 714.79 (6.5%)
-1.9% ( -14% - 12%)
HighTerm 1222.38 (6.3%) 1199.77 (7.1%)
-1.8% ( -14% - 12%)
AndHighHigh 58.93 (6.2%) 58.01 (5.8%)
-1.6% ( -12% - 11%)
Prefix3 152.21 (4.5%) 150.00 (5.0%)
-1.5% ( -10% - 8%)
IntNRQConjMedTerm 79.15 (10.7%) 78.06 (10.5%)
-1.4% ( -20% - 22%)
HighTermDayOfYearSort 95.28 (5.1%) 94.10 (7.8%)
-1.2% ( -13% - 12%)
Wildcard 64.23 (2.3%) 63.45 (2.3%)
-1.2% ( -5% - 3%)
MedSpanNear 81.15 (2.2%) 80.19 (2.8%)
-1.2% ( -6% - 3%)
HighSpanNear 10.20 (3.9%) 10.08 (4.2%)
-1.2% ( -8% - 7%)
HighIntervalsOrdered 4.07 (1.8%) 4.03 (2.2%)
-1.1% ( -4% - 2%)
LowSpanNear 41.62 (3.1%) 41.20 (3.6%)
-1.0% ( -7% - 5%)
IntNRQConjLowTerm 20.36 (4.1%) 20.15 (4.5%)
-1.0% ( -9% - 7%)
IntNRQConjHighTerm 64.84 (9.6%) 64.21 (9.4%)
-1.0% ( -18% - 19%)
AndHighMed 229.08 (2.8%) 227.00 (2.5%)
-0.9% ( -6% - 4%)
MedPhrase 18.73 (1.5%) 18.57 (2.3%)
-0.8% ( -4% - 2%)
LowSloppyPhrase 124.52 (2.3%) 123.48 (2.6%)
-0.8% ( -5% - 4%)
Respell 69.26 (3.0%) 68.68 (2.9%)
-0.8% ( -6% - 5%)
HighPhrase 12.98 (1.6%) 12.88 (2.2%)
-0.7% ( -4% - 3%)
PrefixConjLowTerm 42.11 (2.6%) 41.81 (3.0%)
-0.7% ( -6% - 5%)
OrHighNotHigh 680.34 (6.1%) 676.16 (7.6%)
-0.6% ( -13% - 13%)
MedSloppyPhrase 34.06 (4.9%) 33.89 (4.5%)
-0.5% ( -9% - 9%)
IntNRQ 89.97 (12.4%) 89.62 (12.0%)
-0.4% ( -22% - 27%)
HighSloppyPhrase 8.28 (4.0%) 8.25 (3.9%)
-0.3% ( -7% - 7%)
WildcardConjLowTerm 36.35 (2.7%) 36.26 (2.7%)
-0.3% ( -5% - 5%)
OrHighHigh 27.89 (2.6%) 27.85 (3.1%)
-0.1% ( -5% - 5%)
Fuzzy2 44.19 (3.8%) 44.17 (3.1%)
-0.1% ( -6% - 7%)
OrHighMed 90.42 (2.8%) 90.57 (2.8%)
0.2% ( -5% - 6%)
PrefixConjMedTerm 45.56 (2.8%) 45.79 (2.9%)
0.5% ( -5% - 6%)
WildcardConjHighTerm 33.08 (2.6%) 33.47 (3.0%)
1.2% ( -4% - 6%)
PrefixConjHighTerm 83.65 (2.6%) 86.23 (3.7%)
3.1% ( -3% - 9%)
HighTermMonthSort 130.35 (15.8%) 135.08 (12.1%)
3.6% ( -20% - 37%)
WildcardConjMedTerm 99.19 (3.6%) 103.37 (4.1%)
4.2% ( -3% - 12%)
{noformat}
> Use exponential search in IntArrayDocIdSet advance method
> ---------------------------------------------------------
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Luca Cavanna
> Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making
> its advance method use exponential search instead of binary search. This
> should help performance of queries including conjunctions: given that
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps,
> hence exponential search should be faster when advancing on average compared
> to binary search.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]