[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Da Huang updated LUCENE-4396:
-----------------------------
Attachment: LUCENE-4396.patch
This is a patch based on the git mirror commit
7f66461aea7bc2cb6f31a993cba77734e5e0f9d9.
In this patch, I take the bucketTable as an array but not a hash table.
It seems that its perf. is better than former patches' on most cases.
As you know, after putting required docs into bucketTable, I have to scan both
the table and optional docs. Here, I have tried skipping to scan the
bucketTable to improve the perf. The results is as follows.
{code}
No skip
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndTonsLowNot 6.56 (3.1%) 2.59 (1.0%)
-60.5% ( -62% - -58%)
HighAndTonsLowOr 6.43 (3.3%) 2.58 (0.8%)
-59.9% ( -61% - -57%)
HighAndSomeLowOr 8.49 (8.5%) 4.05 (1.8%)
-52.3% ( -57% - -45%)
HighAndSomeLowNot 6.17 (8.6%) 3.16 (2.1%)
-48.8% ( -54% - -41%)
LowAndSomeLowOr 250.58 (2.0%) 194.86 (1.6%)
-22.2% ( -25% - -18%)
LowAndSomeLowNot 178.66 (1.6%) 147.67 (2.2%)
-17.3% ( -20% - -13%)
LowAndSomeHighOr 40.71 (2.8%) 41.50 (1.8%)
2.0% ( -2% - 6%)
PKLookup 97.59 (3.0%) 99.52 (4.6%)
2.0% ( -5% - 9%)
LowAndSomeHighNot 20.76 (3.0%) 21.54 (2.3%)
3.7% ( -1% - 9%)
HighAndSomeHighNot 2.22 (1.7%) 2.67 (4.4%)
20.3% ( 13% - 26%)
LowAndTonsHighNot 3.81 (2.3%) 4.60 (2.1%)
20.8% ( 15% - 25%)
LowAndTonsHighOr 2.87 (2.3%) 3.48 (2.6%)
21.2% ( 15% - 26%)
HighAndSomeHighOr 1.74 (2.1%) 2.16 (3.5%)
24.0% ( 18% - 30%)
LowAndTonsLowOr 18.66 (1.3%) 23.68 (1.9%)
26.9% ( 23% - 30%)
LowAndTonsLowNot 16.01 (1.4%) 22.16 (2.8%)
38.4% ( 33% - 43%)
HighAndTonsHighOr 0.04 (0.9%) 0.11 (9.8%)
158.2% ( 146% - 170%)
HighAndTonsHighNot 0.06 (1.1%) 0.15 (13.5%)
166.2% ( 149% - 182%)
---------------------------------------------------
Binary search skip
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndTonsLowNot 6.22 (3.8%) 2.45 (0.9%)
-60.6% ( -62% - -58%)
HighAndSomeLowOr 8.29 (11.2%) 4.40 (3.0%)
-46.9% ( -54% - -36%)
HighAndSomeLowNot 12.34 (7.1%) 6.65 (2.6%)
-46.1% ( -52% - -39%)
LowAndSomeLowOr 232.38 (2.9%) 165.05 (1.8%)
-29.0% ( -32% - -24%)
HighAndTonsLowOr 5.17 (6.2%) 3.75 (3.0%)
-27.4% ( -34% - -19%)
LowAndSomeLowNot 227.71 (2.6%) 171.13 (3.2%)
-24.8% ( -29% - -19%)
HighAndSomeHighOr 1.35 (3.9%) 1.14 (3.5%)
-16.1% ( -22% - -9%)
LowAndSomeHighOr 50.17 (3.6%) 48.84 (3.7%)
-2.7% ( -9% - 4%)
LowAndSomeHighNot 52.71 (3.0%) 51.55 (3.8%)
-2.2% ( -8% - 4%)
PKLookup 90.17 (3.5%) 91.38 (3.3%)
1.3% ( -5% - 8%)
HighAndSomeHighNot 1.69 (2.9%) 2.00 (6.3%)
18.5% ( 8% - 28%)
LowAndTonsLowOr 15.61 (1.9%) 18.59 (2.8%)
19.0% ( 14% - 24%)
LowAndTonsHighOr 1.82 (2.7%) 2.20 (4.6%)
20.7% ( 13% - 28%)
LowAndTonsLowNot 15.51 (1.7%) 20.14 (3.8%)
29.8% ( 23% - 35%)
LowAndTonsHighNot 1.01 (2.9%) 1.34 (6.5%)
31.7% ( 21% - 42%)
HighAndTonsHighOr 0.07 (0.9%) 0.12 (6.9%)
77.7% ( 69% - 86%)
HighAndTonsHighNot 0.07 (1.4%) 0.19 (11.9%)
162.4% ( 146% - 178%)
---------------------------------------------------
8 steps skip
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndTonsLowNot 5.45 (3.3%) 1.69 (1.3%)
-69.0% ( -71% - -66%)
HighAndSomeLowOr 5.46 (11.0%) 2.76 (4.4%)
-49.5% ( -58% - -38%)
HighAndSomeLowNot 17.94 (5.7%) 10.40 (3.8%)
-42.1% ( -48% - -34%)
LowAndSomeLowOr 306.62 (1.7%) 231.45 (1.5%)
-24.5% ( -27% - -21%)
LowAndSomeLowNot 286.30 (1.7%) 218.13 (2.0%)
-23.8% ( -27% - -20%)
HighAndTonsLowOr 6.34 (3.5%) 5.31 (4.5%)
-16.3% ( -23% - -8%)
LowAndSomeHighOr 33.53 (2.1%) 33.85 (2.2%)
1.0% ( -3% - 5%)
PKLookup 97.39 (1.9%) 98.40 (3.9%)
1.0% ( -4% - 6%)
LowAndSomeHighNot 42.16 (2.0%) 42.73 (2.1%)
1.4% ( -2% - 5%)
HighAndSomeHighOr 2.43 (2.4%) 2.76 (4.8%)
13.4% ( 6% - 21%)
HighAndSomeHighNot 2.74 (1.4%) 3.17 (4.6%)
15.7% ( 9% - 21%)
LowAndTonsHighOr 3.45 (1.8%) 4.21 (3.2%)
22.0% ( 16% - 27%)
LowAndTonsHighNot 2.37 (1.8%) 2.95 (3.0%)
24.6% ( 19% - 30%)
LowAndTonsLowOr 17.21 (1.1%) 22.50 (2.6%)
30.7% ( 26% - 34%)
LowAndTonsLowNot 13.60 (1.4%) 19.97 (2.4%)
46.8% ( 42% - 51%)
HighAndTonsHighOr 0.08 (0.5%) 0.19 (9.9%)
140.3% ( 129% - 151%)
HighAndTonsHighNot 0.06 (1.7%) 0.15 (12.0%)
163.9% ( 147% - 180%)
---------------------------------------------------
16 steps skip
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndTonsLowNot 6.69 (2.0%) 2.71 (0.8%)
-59.5% ( -61% - -57%)
HighAndTonsLowOr 1.69 (10.1%) 0.89 (2.1%)
-47.1% ( -53% - -38%)
HighAndSomeLowOr 7.28 (11.5%) 3.96 (1.9%)
-45.6% ( -52% - -36%)
HighAndSomeLowNot 14.38 (5.2%) 8.09 (1.5%)
-43.7% ( -47% - -39%)
LowAndSomeLowOr 295.60 (2.3%) 223.80 (2.0%)
-24.3% ( -27% - -20%)
LowAndSomeLowNot 171.52 (1.7%) 140.82 (1.5%)
-17.9% ( -20% - -14%)
LowAndSomeHighOr 40.12 (2.1%) 41.32 (3.2%)
3.0% ( -2% - 8%)
PKLookup 96.15 (2.4%) 99.15 (6.0%)
3.1% ( -5% - 11%)
LowAndSomeHighNot 31.53 (2.3%) 32.64 (2.9%)
3.5% ( -1% - 8%)
HighAndSomeHighNot 2.67 (1.3%) 3.04 (3.4%)
13.9% ( 9% - 18%)
HighAndSomeHighOr 2.11 (2.1%) 2.58 (3.3%)
22.5% ( 16% - 28%)
LowAndTonsHighOr 2.17 (1.8%) 2.67 (3.2%)
23.1% ( 17% - 28%)
LowAndTonsHighNot 2.53 (1.6%) 3.16 (3.3%)
25.2% ( 20% - 30%)
LowAndTonsLowNot 14.68 (0.9%) 20.97 (3.6%)
42.8% ( 38% - 47%)
LowAndTonsLowOr 14.04 (1.1%) 20.09 (2.6%)
43.0% ( 38% - 47%)
HighAndTonsHighOr 0.06 (0.7%) 0.15 (9.4%)
152.0% ( 141% - 163%)
HighAndTonsHighNot 0.05 (0.8%) 0.14 (12.1%)
167.3% ( 153% - 181%)
---------------------------------------------------
32 steps skip
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAndTonsLowNot 6.50 (2.6%) 3.24 (1.1%)
-50.1% ( -52% - -47%)
HighAndSomeLowNot 9.51 (6.4%) 4.87 (3.2%)
-48.8% ( -54% - -41%)
HighAndSomeLowOr 14.87 (11.6%) 8.81 (3.6%)
-40.8% ( -50% - -28%)
LowAndSomeLowOr 311.27 (2.6%) 241.43 (1.6%)
-22.4% ( -25% - -18%)
LowAndSomeLowNot 231.96 (2.4%) 181.95 (2.0%)
-21.6% ( -25% - -17%)
HighAndTonsLowOr 5.60 (5.7%) 4.45 (3.7%)
-20.5% ( -28% - -11%)
LowAndSomeHighNot 62.10 (2.6%) 60.59 (2.5%)
-2.4% ( -7% - 2%)
LowAndSomeHighOr 49.36 (3.0%) 48.87 (3.1%)
-1.0% ( -6% - 5%)
PKLookup 96.38 (2.0%) 95.91 (2.5%)
-0.5% ( -4% - 4%)
HighAndSomeHighNot 2.08 (1.6%) 2.34 (5.2%)
12.7% ( 5% - 19%)
HighAndSomeHighOr 2.30 (2.6%) 2.63 (5.7%)
14.2% ( 5% - 23%)
LowAndTonsHighOr 1.88 (2.5%) 2.35 (4.2%)
25.5% ( 18% - 33%)
LowAndTonsHighNot 1.10 (2.5%) 1.45 (5.0%)
31.1% ( 23% - 39%)
LowAndTonsLowOr 14.38 (1.0%) 20.24 (3.2%)
40.8% ( 36% - 45%)
LowAndTonsLowNot 12.98 (1.0%) 18.82 (2.9%)
45.0% ( 40% - 49%)
HighAndTonsHighOr 0.08 (0.8%) 0.18 (12.3%)
138.0% ( 123% - 152%)
HighAndTonsHighNot 0.08 (1.1%) 0.21 (12.5%)
157.6% ( 142% - 172%)
{code}
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]