[
https://issues.apache.org/jira/browse/LUCENE-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7939:
---------------------------------
Attachment: LUCENE-7939.patch
Here is a more complete benchmark, with conjunctions with a low-frequency term
as well in order to better show the impact of the patch.
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
HighConjLow4MinShouldMatch3 238.10 (3.2%) 232.20 (2.2%)
-2.5% ( -7% - 2%)
Low2MinShouldMatch3 5.06 (3.2%) 4.96 (1.9%)
-1.9% ( -6% - 3%)
Low1MinShouldMatch0 3.69 (4.1%) 3.64 (3.6%)
-1.2% ( -8% - 6%)
Low1MinShouldMatch3 3.82 (4.6%) 3.77 (4.5%)
-1.2% ( -9% - 8%)
HighMinShouldMatch3 3.28 (4.6%) 3.24 (4.3%)
-1.1% ( -9% - 8%)
HighMinShouldMatch4 3.46 (5.1%) 3.42 (5.0%)
-1.1% ( -10% - 9%)
HighMinShouldMatch2 3.17 (4.4%) 3.14 (3.8%)
-1.1% ( -8% - 7%)
Low2MinShouldMatch2 4.43 (4.1%) 4.39 (3.9%)
-1.1% ( -8% - 7%)
Low2MinShouldMatch0 4.50 (4.3%) 4.45 (3.7%)
-1.0% ( -8% - 7%)
HighMinShouldMatch0 3.23 (4.4%) 3.20 (3.8%)
-1.0% ( -8% - 7%)
Low3MinShouldMatch2 6.21 (4.1%) 6.15 (3.8%)
-1.0% ( -8% - 7%)
Low4MinShouldMatch2 43.17 (1.5%) 42.74 (1.5%)
-1.0% ( -3% - 2%)
Low1MinShouldMatch2 3.62 (4.0%) 3.59 (3.7%)
-1.0% ( -8% - 7%)
Low1MinShouldMatch4 5.15 (3.2%) 5.10 (1.8%)
-1.0% ( -5% - 4%)
Low3MinShouldMatch0 6.27 (4.3%) 6.22 (3.7%)
-0.8% ( -8% - 7%)
HighConjLow3MinShouldMatch4 395.48 (2.8%) 392.50 (1.7%)
-0.8% ( -5% - 3%)
Low4MinShouldMatch0 10.49 (4.6%) 10.42 (4.0%)
-0.7% ( -8% - 8%)
Low3MinShouldMatch3 39.29 (1.6%) 39.14 (1.9%)
-0.4% ( -3% - 3%)
Low4MinShouldMatch4 439.43 (3.1%) 439.31 (2.0%)
-0.0% ( -4% - 5%)
Low2MinShouldMatch4 46.39 (1.9%) 46.42 (2.2%)
0.1% ( -3% - 4%)
Low4MinShouldMatch3 251.09 (3.0%) 251.28 (2.3%)
0.1% ( -5% - 5%)
Low3MinShouldMatch4 407.65 (2.6%) 408.19 (2.0%)
0.1% ( -4% - 4%)
HighConjLow4MinShouldMatch4 431.30 (3.3%) 431.89 (1.8%)
0.1% ( -4% - 5%)
LowConjLow2MinShouldMatch0 34.97 (2.0%) 35.05 (2.0%)
0.2% ( -3% - 4%)
LowConjHighMinShouldMatch0 24.07 (2.2%) 24.14 (2.2%)
0.3% ( -4% - 4%)
LowConjLow3MinShouldMatch0 47.30 (1.9%) 47.47 (2.0%)
0.4% ( -3% - 4%)
LowConjLow1MinShouldMatch0 28.21 (2.1%) 28.31 (2.1%)
0.4% ( -3% - 4%)
HighConjHighMinShouldMatch0 5.45 (1.5%) 5.48 (1.6%)
0.6% ( -2% - 3%)
HighConjLow1MinShouldMatch0 6.34 (1.4%) 6.38 (1.7%)
0.6% ( -2% - 3%)
LowConjLow4MinShouldMatch0 75.37 (1.4%) 75.83 (1.7%)
0.6% ( -2% - 3%)
HighConjLow2MinShouldMatch0 7.82 (1.3%) 7.89 (1.8%)
1.0% ( -2% - 4%)
HighConjLow3MinShouldMatch0 11.04 (1.3%) 11.19 (2.2%)
1.4% ( -2% - 4%)
HighConjLow4MinShouldMatch0 17.11 (1.2%) 17.51 (2.7%)
2.3% ( -1% - 6%)
HighConjHighMinShouldMatch2 4.82 (1.6%) 4.94 (2.3%)
2.5% ( -1% - 6%)
LowConjHighMinShouldMatch2 23.41 (2.1%) 24.16 (2.2%)
3.2% ( -1% - 7%)
HighConjLow1MinShouldMatch2 5.54 (1.4%) 5.75 (2.1%)
3.7% ( 0% - 7%)
LowConjLow1MinShouldMatch2 27.31 (2.0%) 28.54 (2.2%)
4.5% ( 0% - 8%)
HighConjLow2MinShouldMatch2 7.14 (1.6%) 7.64 (2.3%)
7.1% ( 3% - 11%)
LowConjLow2MinShouldMatch2 34.08 (2.1%) 36.95 (2.3%)
8.4% ( 4% - 13%)
HighConjLow3MinShouldMatch2 10.97 (2.0%) 12.56 (2.8%)
14.5% ( 9% - 19%)
HighConjHighMinShouldMatch3 4.79 (2.2%) 5.61 (2.7%)
17.0% ( 11% - 22%)
LowConjHighMinShouldMatch3 23.04 (2.0%) 27.15 (2.6%)
17.9% ( 13% - 22%)
LowConjLow3MinShouldMatch2 47.31 (2.0%) 59.02 (2.6%)
24.7% ( 19% - 29%)
HighConjLow1MinShouldMatch3 5.63 (2.7%) 7.09 (3.1%)
25.9% ( 19% - 32%)
HighConjLow2MinShouldMatch4 44.44 (2.1%) 56.15 (2.8%)
26.4% ( 20% - 31%)
LowConjLow4MinShouldMatch4 433.71 (3.7%) 554.58 (3.6%)
27.9% ( 19% - 36%)
LowConjLow1MinShouldMatch3 26.52 (1.9%) 34.26 (2.8%)
29.2% ( 24% - 34%)
HighConjLow4MinShouldMatch2 40.97 (1.6%) 55.81 (2.4%)
36.2% ( 31% - 40%)
HighConjLow2MinShouldMatch3 7.81 (2.9%) 10.80 (3.1%)
38.3% ( 31% - 45%)
HighConjLow3MinShouldMatch3 37.26 (1.9%) 52.29 (2.8%)
40.3% ( 34% - 45%)
LowConjLow3MinShouldMatch4 398.61 (3.3%) 560.67 (4.6%)
40.7% ( 31% - 50%)
HighConjHighMinShouldMatch4 5.01 (2.9%) 7.28 (2.9%)
45.4% ( 38% - 52%)
LowConjLow4MinShouldMatch3 247.43 (3.0%) 395.78 (4.7%)
60.0% ( 50% - 69%)
LowConjLow2MinShouldMatch3 32.48 (1.9%) 53.52 (3.9%)
64.8% ( 57% - 71%)
HighConjLow1MinShouldMatch4 6.67 (3.1%) 11.15 (3.5%)
67.2% ( 58% - 76%)
LowConjHighMinShouldMatch4 21.88 (1.8%) 36.60 (3.8%)
67.3% ( 60% - 74%)
LowConjLow1MinShouldMatch4 23.73 (1.8%) 56.91 (6.1%)
139.8% ( 129% - 150%)
LowConjLow4MinShouldMatch2 78.45 (1.6%) 289.55 (10.0%)
269.1% ( 253% - 285%)
LowConjLow3MinShouldMatch3 55.74 (1.9%) 367.43 (17.9%)
559.2% ( 529% - 590%)
LowConjLow2MinShouldMatch4 53.12 (2.1%) 496.43 (29.6%)
834.5% ( 786% - 884%)
{noformat}
I improved a bit the patch so that advance returns a better docID: the
minShouldMatch-th lowest doc ID. This is how I had done it initially but for
some reason it seemed slower at that time. That said this time I checked the
previous patch (baseline) vs. the new patch and performance looks _very_
similar.
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
LowConjLow4MinShouldMatch3 394.61 (3.3%) 390.19 (3.1%)
-1.1% ( -7% - 5%)
LowConjLow4MinShouldMatch2 284.16 (2.6%) 281.31 (2.4%)
-1.0% ( -5% - 4%)
LowConjLow3MinShouldMatch4 554.91 (2.4%) 549.88 (2.9%)
-0.9% ( -6% - 4%)
Low4MinShouldMatch4 431.80 (2.9%) 428.16 (3.4%)
-0.8% ( -6% - 5%)
Low4MinShouldMatch3 246.52 (3.1%) 244.78 (3.4%)
-0.7% ( -6% - 5%)
LowConjLow3MinShouldMatch0 46.84 (1.4%) 46.58 (1.9%)
-0.6% ( -3% - 2%)
LowConjLow2MinShouldMatch2 36.51 (1.7%) 36.31 (2.1%)
-0.6% ( -4% - 3%)
HighConjLow2MinShouldMatch2 7.58 (2.2%) 7.54 (2.3%)
-0.6% ( -4% - 4%)
LowConjLow1MinShouldMatch4 56.08 (2.5%) 55.80 (2.4%)
-0.5% ( -5% - 4%)
LowConjLow1MinShouldMatch0 27.91 (1.8%) 27.77 (1.8%)
-0.5% ( -4% - 3%)
LowConjLow4MinShouldMatch4 546.13 (3.8%) 543.52 (3.3%)
-0.5% ( -7% - 6%)
LowConjLow3MinShouldMatch2 58.27 (2.0%) 58.01 (1.9%)
-0.5% ( -4% - 3%)
HighConjLow2MinShouldMatch3 10.70 (2.3%) 10.66 (2.3%)
-0.4% ( -4% - 4%)
HighConjLow1MinShouldMatch4 11.05 (2.3%) 11.00 (2.2%)
-0.4% ( -4% - 4%)
HighConjLow3MinShouldMatch4 385.64 (3.1%) 384.08 (3.7%)
-0.4% ( -7% - 6%)
LowConjHighMinShouldMatch3 26.84 (2.2%) 26.74 (1.8%)
-0.4% ( -4% - 3%)
Low2MinShouldMatch3 4.90 (2.4%) 4.89 (2.7%)
-0.4% ( -5% - 4%)
HighConjLow3MinShouldMatch2 12.44 (2.4%) 12.40 (2.0%)
-0.3% ( -4% - 4%)
HighConjLow2MinShouldMatch4 55.38 (2.3%) 55.20 (1.9%)
-0.3% ( -4% - 3%)
HighConjLow3MinShouldMatch3 51.66 (2.0%) 51.49 (1.8%)
-0.3% ( -4% - 3%)
LowConjLow3MinShouldMatch3 361.89 (3.1%) 360.91 (2.7%)
-0.3% ( -5% - 5%)
LowConjLow2MinShouldMatch4 484.47 (2.8%) 483.24 (3.3%)
-0.3% ( -6% - 6%)
Low2MinShouldMatch4 45.70 (2.2%) 45.59 (1.6%)
-0.2% ( -3% - 3%)
Low1MinShouldMatch0 3.62 (4.5%) 3.61 (4.4%)
-0.2% ( -8% - 9%)
Low3MinShouldMatch3 38.62 (1.4%) 38.53 (1.4%)
-0.2% ( -2% - 2%)
LowConjHighMinShouldMatch0 23.80 (1.8%) 23.75 (1.8%)
-0.2% ( -3% - 3%)
HighMinShouldMatch2 3.10 (4.8%) 3.10 (4.6%)
-0.2% ( -9% - 9%)
HighConjLow1MinShouldMatch2 5.71 (2.3%) 5.69 (2.3%)
-0.2% ( -4% - 4%)
LowConjLow2MinShouldMatch0 34.54 (2.1%) 34.48 (1.6%)
-0.2% ( -3% - 3%)
HighConjHighMinShouldMatch4 7.24 (2.3%) 7.23 (2.2%)
-0.2% ( -4% - 4%)
Low1MinShouldMatch2 3.54 (4.5%) 3.54 (4.5%)
-0.1% ( -8% - 9%)
HighConjHighMinShouldMatch2 4.90 (2.2%) 4.90 (2.1%)
-0.1% ( -4% - 4%)
HighConjLow1MinShouldMatch3 7.04 (2.4%) 7.03 (2.3%)
-0.1% ( -4% - 4%)
HighConjLow4MinShouldMatch2 55.15 (2.1%) 55.09 (1.7%)
-0.1% ( -3% - 3%)
HighConjLow3MinShouldMatch0 11.00 (2.3%) 10.99 (2.3%)
-0.1% ( -4% - 4%)
Low3MinShouldMatch0 6.14 (4.6%) 6.14 (4.4%)
-0.1% ( -8% - 9%)
Low1MinShouldMatch4 5.03 (2.6%) 5.03 (2.8%)
-0.1% ( -5% - 5%)
LowConjHighMinShouldMatch2 23.86 (2.2%) 23.84 (1.8%)
-0.1% ( -3% - 3%)
HighConjLow1MinShouldMatch0 6.24 (2.5%) 6.24 (2.7%)
-0.0% ( -5% - 5%)
Low3MinShouldMatch4 398.82 (3.5%) 398.66 (2.9%)
-0.0% ( -6% - 6%)
LowConjLow1MinShouldMatch2 28.20 (1.9%) 28.19 (1.8%)
-0.0% ( -3% - 3%)
HighMinShouldMatch0 3.16 (4.9%) 3.16 (4.8%)
-0.0% ( -9% - 10%)
HighMinShouldMatch3 3.20 (5.3%) 3.19 (5.2%)
-0.0% ( -10% - 11%)
HighConjLow4MinShouldMatch4 426.74 (3.2%) 426.65 (3.4%)
-0.0% ( -6% - 6%)
HighMinShouldMatch4 3.36 (5.8%) 3.36 (5.6%)
-0.0% ( -10% - 11%)
LowConjLow1MinShouldMatch3 33.74 (2.4%) 33.74 (1.8%)
-0.0% ( -4% - 4%)
Low4MinShouldMatch0 10.28 (4.9%) 10.28 (4.7%)
0.0% ( -9% - 10%)
HighConjLow4MinShouldMatch3 229.78 (3.0%) 229.82 (2.4%)
0.0% ( -5% - 5%)
LowConjHighMinShouldMatch4 36.04 (2.3%) 36.05 (2.1%)
0.0% ( -4% - 4%)
LowConjLow4MinShouldMatch0 74.47 (1.9%) 74.48 (2.0%)
0.0% ( -3% - 3%)
Low4MinShouldMatch2 42.32 (1.4%) 42.33 (1.5%)
0.0% ( -2% - 2%)
HighConjHighMinShouldMatch3 5.57 (2.6%) 5.57 (2.3%)
0.0% ( -4% - 5%)
HighConjLow4MinShouldMatch0 17.19 (2.9%) 17.19 (2.9%)
0.0% ( -5% - 6%)
LowConjLow2MinShouldMatch3 52.65 (2.4%) 52.68 (2.1%)
0.0% ( -4% - 4%)
HighConjLow2MinShouldMatch0 7.75 (2.3%) 7.75 (2.2%)
0.1% ( -4% - 4%)
HighConjHighMinShouldMatch0 5.37 (2.5%) 5.37 (2.6%)
0.1% ( -4% - 5%)
Low2MinShouldMatch2 4.33 (4.7%) 4.33 (4.7%)
0.1% ( -8% - 9%)
Low3MinShouldMatch2 6.06 (4.6%) 6.06 (4.7%)
0.1% ( -8% - 9%)
Low2MinShouldMatch0 4.40 (4.8%) 4.40 (4.5%)
0.1% ( -8% - 9%)
Low1MinShouldMatch3 3.71 (5.3%) 3.72 (4.9%)
0.2% ( -9% - 10%)
{noformat}
Given that returning a better next candidate is safer, I'd like to go with the
new patch.
> Speed up MinShouldMatchSumScorer in conjunctions
> ------------------------------------------------
>
> Key: LUCENE-7939
> URL: https://issues.apache.org/jira/browse/LUCENE-7939
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7939.patch, LUCENE-7939.patch
>
>
> MinShouldMatchSumScorer has good iteration capabilities, but if it is not
> used as a lead for the iteration then the advance() call might make a lot of
> efforts in order to find the next match while we should instead let the lead
> iterator of the conjunction skip over non-matching documents. In this issue
> I'd like to explore changing MinShouldMatchSumScorer by giving it a two-phase
> iterator and making advance() return a candidate for the next match that is
> less good but much cheaper to compute.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]