[
https://issues.apache.org/jira/browse/LUCENE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6850:
---------------------------------
Attachment: LUCENE-6850.patch
I iterated on the previous patch in order to also optimize the case when all
clauses return a non-null BulkScorer, but some windows of 2048 documents only
contain matches for one of the sub scorers: in that case we can call the
collector directly instead of going through a bitset and replaying. luceneutil
on wikimedium10 shows a nice speedup for {{OrHighLow}}:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
Fuzzy2 54.93 (13.3%) 51.19 (16.9%)
-6.8% ( -32% - 26%)
OrHighHigh 37.94 (9.1%) 35.76 (7.0%)
-5.7% ( -20% - 11%)
OrHighMed 76.23 (9.0%) 73.41 (6.3%)
-3.7% ( -17% - 12%)
OrNotHighLow 1684.73 (4.6%) 1648.87 (6.6%)
-2.1% ( -12% - 9%)
IntNRQ 13.63 (4.0%) 13.49 (4.8%)
-1.0% ( -9% - 8%)
AndHighLow 731.68 (2.6%) 726.44 (3.6%)
-0.7% ( -6% - 5%)
Respell 61.24 (3.0%) 60.84 (3.7%)
-0.7% ( -7% - 6%)
HighSpanNear 22.89 (3.7%) 22.82 (4.0%)
-0.3% ( -7% - 7%)
HighTerm 136.93 (2.8%) 136.57 (3.1%)
-0.3% ( -5% - 5%)
MedSpanNear 72.54 (3.1%) 72.36 (3.5%)
-0.2% ( -6% - 6%)
MedPhrase 30.70 (1.9%) 30.63 (1.8%)
-0.2% ( -3% - 3%)
HighPhrase 35.13 (3.8%) 35.12 (3.5%)
-0.1% ( -7% - 7%)
MedTerm 184.28 (3.1%) 184.23 (2.5%)
-0.0% ( -5% - 5%)
AndHighHigh 16.74 (1.4%) 16.76 (1.4%)
0.1% ( -2% - 2%)
LowSpanNear 39.03 (1.8%) 39.08 (2.3%)
0.1% ( -3% - 4%)
Wildcard 43.57 (2.6%) 43.66 (2.9%)
0.2% ( -5% - 5%)
AndHighMed 178.28 (1.5%) 178.78 (2.1%)
0.3% ( -3% - 3%)
OrHighNotMed 71.53 (4.7%) 71.79 (2.8%)
0.4% ( -6% - 8%)
OrNotHighMed 79.22 (2.6%) 79.65 (2.0%)
0.5% ( -3% - 5%)
OrNotHighHigh 61.27 (3.0%) 61.61 (2.0%)
0.6% ( -4% - 5%)
LowTerm 818.90 (5.9%) 823.47 (4.3%)
0.6% ( -9% - 11%)
Prefix3 176.52 (2.9%) 177.57 (3.2%)
0.6% ( -5% - 6%)
LowPhrase 380.46 (3.4%) 383.13 (3.4%)
0.7% ( -5% - 7%)
MedSloppyPhrase 155.97 (3.5%) 157.16 (2.8%)
0.8% ( -5% - 7%)
OrHighNotHigh 45.73 (3.1%) 46.09 (1.9%)
0.8% ( -4% - 5%)
LowSloppyPhrase 65.95 (2.0%) 66.59 (1.6%)
1.0% ( -2% - 4%)
OrHighNotLow 97.93 (4.8%) 99.02 (2.4%)
1.1% ( -5% - 8%)
Fuzzy1 49.26 (6.6%) 50.06 (6.7%)
1.6% ( -10% - 16%)
HighSloppyPhrase 24.74 (4.2%) 25.65 (5.8%)
3.7% ( -6% - 14%)
OrHighLow 84.42 (7.7%) 107.15 (8.4%)
26.9% ( 10% - 46%)
{noformat}
> BooleanWeight should not use BS1 when there is a single non-null clause
> -----------------------------------------------------------------------
>
> Key: LUCENE-6850
> URL: https://issues.apache.org/jira/browse/LUCENE-6850
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6850.patch, LUCENE-6850.patch
>
>
> When a disjunction has a single non-null scorer, we still use BS1 for
> bulk-scoring, which first collects matches into a bit set and then calls the
> collector. This is inefficient: we should just call the inner bulk scorer
> directly and wrap the scorer to apply the coord factor (like
> BooleanTopLevelScorers.BoostedScorer does).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]