[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072693#comment-14072693
]
Da Huang edited comment on LUCENE-4396 at 7/24/14 8:53 AM:
-----------------------------------------------------------
This patch is based on the git mirror commit
ce7d0578b30981d15687bf76aec595274efccbad .
This is the first try to merge scorers, so that we can get a better perf of
boolean retrieval.
I create a new class named "BooleanMixedScorerDecider" to choose the best
scorer.
Rules for choosing remains to be improved. I have been working on it to find an
elegant way to define rules.
{code}
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndSomeLowNot 11.53 (7.3%) 10.75 (10.1%)
-6.8% ( -22% - 11%)
HighAndTonsLowNot 4.87 (4.0%) 4.64 (6.0%)
-4.9% ( -14% - 5%)
LowAndSomeLowOr 306.20 (2.2%) 299.06 (2.8%)
-2.3% ( -7% - 2%)
HighAndSomeLowOr 13.67 (9.4%) 13.38 (2.7%)
-2.1% ( -13% - 11%)
HighAndTonsLowOr 4.04 (6.4%) 3.96 (1.9%)
-1.9% ( -9% - 6%)
LowAndSomeLowNot 215.18 (1.9%) 211.14 (2.2%)
-1.9% ( -5% - 2%)
PKLookup 96.26 (2.3%) 94.56 (2.8%)
-1.8% ( -6% - 3%)
HighAndTonsHighNot 0.06 (2.3%) 0.06 (2.6%)
-1.0% ( -5% - 4%)
HighAndTonsHighOr 0.06 (0.6%) 0.06 (1.3%)
0.9% ( 0% - 2%)
HighAndSomeHighNot 1.59 (2.2%) 1.62 (2.9%)
1.7% ( -3% - 6%)
LowAndSomeHighNot 66.33 (2.1%) 68.77 (2.1%)
3.7% ( 0% - 8%)
LowAndSomeHighOr 53.75 (1.6%) 56.86 (2.1%)
5.8% ( 1% - 9%)
LowAndTonsLowNot 14.00 (1.7%) 14.84 (1.5%)
6.1% ( 2% - 9%)
HighAndSomeHighOr 2.39 (2.2%) 2.68 (3.5%)
12.4% ( 6% - 18%)
LowAndTonsLowOr 17.69 (0.9%) 21.64 (1.7%)
22.3% ( 19% - 25%)
LowAndTonsHighOr 1.83 (1.3%) 2.33 (2.4%)
27.2% ( 23% - 31%)
LowAndTonsHighNot 1.15 (1.5%) 1.51 (3.1%)
30.9% ( 25% - 36%)
{code}
was (Author: dhuang):
This is the first try to merge scorers, so that we can get a better perf of
boolean retrieval.
I create a new class named "BooleanMixedScorerDecider" to choose the best
scorer.
Rules for choosing remains to be improved. I have been working on it to find an
elegant way to define rules.
{code}
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndSomeLowNot 11.53 (7.3%) 10.75 (10.1%)
-6.8% ( -22% - 11%)
HighAndTonsLowNot 4.87 (4.0%) 4.64 (6.0%)
-4.9% ( -14% - 5%)
LowAndSomeLowOr 306.20 (2.2%) 299.06 (2.8%)
-2.3% ( -7% - 2%)
HighAndSomeLowOr 13.67 (9.4%) 13.38 (2.7%)
-2.1% ( -13% - 11%)
HighAndTonsLowOr 4.04 (6.4%) 3.96 (1.9%)
-1.9% ( -9% - 6%)
LowAndSomeLowNot 215.18 (1.9%) 211.14 (2.2%)
-1.9% ( -5% - 2%)
PKLookup 96.26 (2.3%) 94.56 (2.8%)
-1.8% ( -6% - 3%)
HighAndTonsHighNot 0.06 (2.3%) 0.06 (2.6%)
-1.0% ( -5% - 4%)
HighAndTonsHighOr 0.06 (0.6%) 0.06 (1.3%)
0.9% ( 0% - 2%)
HighAndSomeHighNot 1.59 (2.2%) 1.62 (2.9%)
1.7% ( -3% - 6%)
LowAndSomeHighNot 66.33 (2.1%) 68.77 (2.1%)
3.7% ( 0% - 8%)
LowAndSomeHighOr 53.75 (1.6%) 56.86 (2.1%)
5.8% ( 1% - 9%)
LowAndTonsLowNot 14.00 (1.7%) 14.84 (1.5%)
6.1% ( 2% - 9%)
HighAndSomeHighOr 2.39 (2.2%) 2.68 (3.5%)
12.4% ( 6% - 18%)
LowAndTonsLowOr 17.69 (0.9%) 21.64 (1.7%)
22.3% ( 19% - 25%)
LowAndTonsHighOr 1.83 (1.3%) 2.33 (2.4%)
27.2% ( 23% - 31%)
LowAndTonsHighNot 1.15 (1.5%) 1.51 (3.1%)
30.9% ( 25% - 36%)
{code}
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch,
> luceneutil-score-equal.patch, stat.cpp, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]