[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Da Huang updated LUCENE-4396:
-----------------------------
Attachment: LUCENE-4396.patch
This is a patch based on git mirror commit
d707f783ab068b70752a3f9cfdc0dabb7f4fbadf .
In this patch, I tried to fix the .getChildren() problem in BAS and BLS.
I have tried to make .bulkScorer() choose DAAT, when scoreDocsInOrder is true.
However, I discovered that I have to copy the scorer choosing logics to
.scoreDocsOutOfOrder()
to make things right.
I have also tried to implement the .getChildren() method for BAS and BLS,
but the TAAT strategy will make scorers exhausted at the beginning.
Finally, I just throw UnsupportedOperationException in BAS.getChildren() and
BLS.getChildren().
Besides, I have run more tests to make sure everything is right.
As you can see, the performance of HighAnd.\*Low.\* cases showed in merge.png
is not good.
Therefore, I ran HighAnd.\*Low.\* cases with luceneutil's pattern filter, and
the result is as follows.
{code}
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAnd6LowOr 9.44 (6.4%) 9.19 (4.8%)
-2.6% ( -12% - 9%)
HighAnd5LowOr 9.00 (8.8%) 8.85 (7.4%)
-1.6% ( -16% - 16%)
HighAnd3LowOr 11.89 (8.9%) 11.71 (7.8%)
-1.6% ( -16% - 16%)
HighAnd4LowOr 10.78 (7.4%) 10.61 (6.3%)
-1.5% ( -14% - 13%)
HighAnd7LowOr 9.08 (7.2%) 8.94 (5.8%)
-1.5% ( -13% - 12%)
HighAnd8LowOr 6.32 (8.6%) 6.23 (6.9%)
-1.4% ( -15% - 15%)
HighAnd9LowOr 5.71 (5.7%) 5.65 (4.5%)
-1.1% ( -10% - 9%)
PKLookup 98.95 (4.5%) 98.38 (2.4%)
-0.6% ( -7% - 6%)
HighAnd9LowNot 7.49 (3.7%) 7.46 (3.2%)
-0.4% ( -7% - 6%)
HighAnd4LowNot 10.33 (6.4%) 10.31 (6.1%)
-0.2% ( -11% - 13%)
HighAnd8LowNot 6.69 (5.3%) 6.70 (4.9%)
0.1% ( -9% - 10%)
HighAnd7LowNot 6.82 (5.1%) 6.84 (5.0%)
0.3% ( -9% - 10%)
HighAnd6LowNot 9.45 (5.5%) 9.48 (4.7%)
0.3% ( -9% - 11%)
HighAnd3LowNot 10.80 (6.7%) 10.87 (6.1%)
0.6% ( -11% - 14%)
HighAnd5LowNot 4.28 (7.4%) 4.32 (7.1%)
1.0% ( -12% - 16%)
{code}
Everything looks right.
I have also run tests for more complicate tasks.
{code}
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
LowAnd6LowOr6LowNot 31.59 (1.0%) 28.52 (2.4%)
-9.7% ( -12% - -6%)
HighAnd6LowOr6LowNot 6.10 (2.7%) 5.76 (4.0%)
-5.6% ( -11% - 1%)
MedAnd6LowOr6LowNot 7.33 (2.3%) 7.03 (3.1%)
-4.0% ( -9% - 1%)
HighAnd6MedOr6LowNot 3.51 (1.5%) 3.49 (2.6%)
-0.6% ( -4% - 3%)
PKLookup 95.99 (5.1%) 95.48 (4.9%)
-0.5% ( -10% - 9%)
HighAnd6MedOr6MedNot 1.96 (1.3%) 1.97 (2.5%)
0.4% ( -3% - 4%)
MedAnd6MedOr6MedNot 2.34 (1.2%) 2.35 (2.3%)
0.5% ( -2% - 4%)
HighAnd6LowOr6HighNot 1.31 (1.1%) 1.33 (2.4%)
0.9% ( -2% - 4%)
HighAnd6LowOr6MedNot 3.08 (1.5%) 3.12 (2.7%)
1.2% ( -2% - 5%)
MedAnd6LowOr6MedNot 3.72 (1.4%) 3.89 (2.6%)
4.8% ( 0% - 8%)
HighAnd6MedOr6HighNot 1.40 (1.0%) 1.53 (2.4%)
9.3% ( 5% - 12%)
LowAnd6LowOr6MedNot 9.23 (2.1%) 10.19 (2.7%)
10.4% ( 5% - 15%)
LowAnd6LowOr6HighNot 6.04 (2.5%) 6.74 (2.9%)
11.6% ( 6% - 17%)
LowAnd6HighOr6HighNot 4.15 (3.4%) 4.72 (4.2%)
13.8% ( 5% - 22%)
MedAnd6MedOr6HighNot 1.65 (1.2%) 1.91 (2.2%)
15.7% ( 12% - 19%)
MedAnd6LowOr6HighNot 2.42 (1.7%) 2.80 (2.7%)
16.0% ( 11% - 20%)
LowAnd6HighOr6LowNot 4.69 (2.9%) 5.45 (3.7%)
16.1% ( 9% - 23%)
MedAnd6MedOr6LowNot 3.45 (1.2%) 4.04 (2.1%)
17.1% ( 13% - 20%)
LowAnd6MedOr6LowNot 8.77 (1.6%) 10.38 (2.4%)
18.4% ( 14% - 22%)
LowAnd6MedOr6MedNot 6.36 (2.6%) 7.55 (3.5%)
18.6% ( 12% - 25%)
LowAnd6MedOr6HighNot 5.48 (3.1%) 6.51 (3.9%)
18.8% ( 11% - 26%)
LowAnd6HighOr6MedNot 5.77 (3.1%) 6.86 (4.3%)
18.9% ( 11% - 27%)
MedAnd6HighOr6HighNot 1.22 (1.0%) 1.46 (2.0%)
19.8% ( 16% - 23%)
HighAnd6HighOr6MedNot 1.32 (1.1%) 1.59 (2.0%)
20.7% ( 17% - 24%)
MedAnd6HighOr6MedNot 1.72 (1.5%) 2.09 (2.2%)
21.3% ( 17% - 25%)
HighAnd6HighOr6HighNot 1.26 (1.2%) 1.56 (2.1%)
24.0% ( 20% - 27%)
HighAnd6HighOr6LowNot 1.54 (1.3%) 1.92 (2.0%)
24.7% ( 21% - 28%)
MedAnd6HighOr6LowNot 2.26 (1.5%) 2.85 (1.9%)
26.3% ( 22% - 30%)
{code}
All look good.
If no other problems, I will begin to clean up those unused logics in the code
such as BLS,
and refine the javadoc.
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf,
> luceneutil-score-equal.patch, luceneutil-score-equal.patch, merge.perf,
> merge.png, perf.png, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]