[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Da Huang updated LUCENE-4396:
-----------------------------
Attachment: LUCENE-4396.patch
And.tasks
A patch based on lucene github mirror commit
cf10341825ff6bd1662dd48c51926bc51d751ce5.
I use a bitset to skip required docs when scaning optional and prohibited docs.
The perf. comparison is at the bottom.
Besides, I build a new tasks file the test the perf. and I discover that BNS
optimize the "+a -b -c -d ..." case a lot, when "b c d ..." hits many docs.
<code>
BNS (without bitset) vs. BS2
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndTonsLowNot 4.29 (2.9%) 1.08 (0.6%)
-74.8% ( -76% - -73%)
HighAndTonsLowOr 4.87 (6.4%) 1.24 (1.0%)
-74.4% ( -76% - -71%)
HighAndSomeLowNot 9.03 (5.2%) 4.11 (4.1%)
-54.4% ( -60% - -47%)
HighAndSomeLowOr 16.21 (9.6%) 7.75 (4.1%)
-52.2% ( -60% - -42%)
LowAndSomeLowOr 303.28 (2.4%) 183.14 (6.6%)
-39.6% ( -47% - -31%)
LowAndSomeLowNot 257.24 (1.8%) 157.07 (6.5%)
-38.9% ( -46% - -31%)
LowAndSomeHighOr 36.78 (1.9%) 33.74 (3.0%)
-8.3% ( -12% - -3%)
LowAndTonsLowNot 21.28 (2.0%) 19.69 (6.9%)
-7.5% ( -16% - 1%)
LowAndSomeHighNot 34.40 (1.6%) 33.69 (3.2%)
-2.1% ( -6% - 2%)
PKLookup 100.63 (4.8%) 103.46 (4.7%)
2.8% ( -6% - 12%)
LowAndTonsHighOr 1.26 (1.6%) 1.41 (1.7%)
11.8% ( 8% - 15%)
LowAndTonsLowOr 13.66 (0.9%) 15.50 (6.0%)
13.5% ( 6% - 20%)
HighAndSomeHighNot 2.65 (1.4%) 3.12 (6.5%)
17.6% ( 9% - 25%)
HighAndSomeHighOr 2.21 (2.4%) 2.62 (5.8%)
18.6% ( 10% - 27%)
HighAndTonsHighOr 0.07 (0.8%) 0.19 (10.5%)
160.3% ( 147% - 172%)
LowAndTonsHighNot 2.86 (1.6%) 10.24 (18.1%)
257.7% ( 234% - 281%)
HighAndTonsHighNot 0.05 (0.8%) 0.40 (28.2%)
641.8% ( 607% - 676%)
BS vs. BS2
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndTonsLowOr 4.02 (6.8%) 0.87 (0.5%)
-78.2% ( -80% - -76%)
HighAndTonsLowNot 4.95 (3.4%) 1.29 (0.9%)
-73.9% ( -75% - -72%)
HighAndSomeLowOr 14.45 (9.5%) 6.68 (3.7%)
-53.8% ( -61% - -44%)
HighAndSomeLowNot 14.78 (5.1%) 7.48 (3.9%)
-49.4% ( -55% - -42%)
LowAndSomeLowOr 316.55 (2.2%) 170.14 (5.6%)
-46.3% ( -52% - -39%)
LowAndSomeLowNot 283.47 (1.7%) 157.35 (6.0%)
-44.5% ( -51% - -37%)
LowAndSomeHighOr 39.39 (2.0%) 35.07 (3.1%)
-11.0% ( -15% - -6%)
LowAndSomeHighNot 53.96 (2.0%) 48.57 (3.8%)
-10.0% ( -15% - -4%)
LowAndTonsLowNot 17.97 (1.5%) 17.04 (6.0%)
-5.2% ( -12% - 2%)
PKLookup 97.57 (2.7%) 100.21 (5.2%)
2.7% ( -5% - 10%)
LowAndTonsHighOr 3.59 (1.7%) 3.74 (2.4%)
4.1% ( 0% - 8%)
LowAndTonsLowOr 14.71 (1.3%) 15.63 (5.7%)
6.3% ( 0% - 13%)
HighAndSomeHighNot 1.84 (1.3%) 2.05 (5.6%)
11.2% ( 4% - 18%)
HighAndSomeHighOr 1.93 (2.1%) 2.16 (5.6%)
11.9% ( 4% - 20%)
HighAndTonsHighOr 0.05 (1.0%) 0.13 (14.1%)
144.8% ( 128% - 161%)
LowAndTonsHighNot 1.63 (1.9%) 4.95 (7.2%)
204.0% ( 191% - 217%)
HighAndTonsHighNot 0.06 (1.0%) 0.34 (18.2%)
459.6% ( 435% - 483%)
BNS (with bitset) vs. BS2
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndSomeLowOr 7.45 (12.0%) 3.49 (6.6%)
-53.1% ( -64% - -39%)
HighAndSomeLowNot 10.45 (8.0%) 5.25 (6.8%)
-49.7% ( -59% - -37%)
LowAndSomeLowOr 310.53 (2.3%) 168.56 (5.8%)
-45.7% ( -52% - -38%)
LowAndSomeLowNot 292.05 (2.3%) 165.88 (5.7%)
-43.2% ( -50% - -36%)
HighAndTonsLowNot 5.94 (3.5%) 4.33 (6.8%)
-27.0% ( -36% - -17%)
HighAndTonsLowOr 5.92 (4.4%) 4.39 (6.0%)
-25.9% ( -34% - -16%)
LowAndSomeHighNot 53.79 (2.4%) 47.71 (2.8%)
-11.3% ( -16% - -6%)
LowAndSomeHighOr 31.03 (2.6%) 28.20 (2.4%)
-9.1% ( -13% - -4%)
LowAndTonsLowOr 18.58 (1.1%) 17.60 (6.2%)
-5.3% ( -12% - 2%)
HighAndSomeHighNot 1.49 (1.8%) 1.44 (8.9%)
-3.5% ( -13% - 7%)
PKLookup 96.96 (3.4%) 100.03 (5.1%)
3.2% ( -5% - 12%)
LowAndTonsHighOr 2.06 (2.2%) 2.18 (2.3%)
5.9% ( 1% - 10%)
LowAndTonsLowNot 13.63 (1.3%) 14.57 (6.3%)
6.9% ( 0% - 14%)
HighAndSomeHighOr 2.03 (2.4%) 2.33 (8.1%)
14.5% ( 3% - 25%)
HighAndTonsHighOr 0.07 (0.8%) 0.17 (13.6%)
158.2% ( 142% - 174%)
LowAndTonsHighNot 1.40 (2.2%) 6.21 (11.3%)
344.2% ( 323% - 365%)
HighAndTonsHighNot 0.07 (1.1%) 0.46 (24.2%)
572.1% ( 540% - 604%)
</code>
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]