[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060615#comment-14060615
]
Da Huang edited comment on LUCENE-4396 at 7/14/14 1:06 PM:
-----------------------------------------------------------
I have done tests for different SIZE of bucketTable.
The file 'SIZE.perf' is the original test result data.
'stat.cpp' is a C++ program used to do statistic on *.perf files.
You can compile it with 'g++ stat.cpp -std=c++0x -o stat'
and run by './stat < SIZE.perf'
The statistic result for SIZE.perf is supposed to be as follows.
{code}
Task size10 size11 size5
size6 size7 size8 size9
HighAndSomeHighNot -14.5 4.0 6.6
-3.0 5.2 10.0* 3.4
HighAndSomeHighOr 2.4 10.9 17.3
17.4 12.9 18.3 21.3*
HighAndSomeLowNot -36.8 -37.3 -47.8
-47.8 -40.2 -42.2 -41.5
HighAndSomeLowOr -45.1 -46.4 -47.9
-46.2 -38.7 -39.7 -44.9
HighAndTonsHighNot 162.4* 145.1 149.1
130.1 142.9 144.7 143.7
HighAndTonsHighOr 154.8* 146.5 154.0
137.8 144.9 150.0 149.1
HighAndTonsLowNot -27.0 -17.4 -73.7
-49.6 -40.1 -28.6 -15.6
HighAndTonsLowOr -28.7 -14.3 -63.8
-44.8 -33.0 -24.4 -13.9
LowAndSomeHighNot 3.0 0.2 4.5
6.2* 5.7 6.2* 4.7
LowAndSomeHighOr 5.3 1.4 6.8*
6.7 7.7 5.8 6.6
LowAndSomeLowNot -6.3 -24.4 3.7*
0.8 1.7 -2.3 -4.0
LowAndSomeLowOr -10.3 -22.7 2.2*
2.0 1.7 -2.3 -8.8
LowAndTonsHighNot 27.3* 21.4 22.5
21.5 21.0 23.8 26.5
LowAndTonsHighOr 23.1 28.2 24.2
23.9 29.1* 27.5 28.2
LowAndTonsLowNot 33.0 46.5 39.1
33.4 30.0 47.2* 44.3
LowAndTonsLowOr 45.7* 34.6 29.9
36.8 45.3 40.9 38.1
{code}
size7 means the bucketTable's size is 1 << 7.
the character '*', which is added manually, marks the best value.
It seems that we can get a better result on \*Some\* tasks if we combine size9
with size5.
was (Author: dhuang):
I have done tests for different SIZE of bucketTable.
The file 'SIZE.perf' is the original test result data.
'stat.cpp' is a C++ program used to do statistic on *.perf files.
You can compile it with 'g++ stat.cpp -std=c++0x -o stat'
and run by './stat < SIZE.perf'
The statistic result for SIZE.perf is supposed to be as follows.
{code}
Task size10 size11 size5
size6 size7 size8 size9
HighAndSomeHighNot -14.5 4.0 6.6
-3.0 5.2 10.0* 3.4
HighAndSomeHighOr 2.4 10.9 17.3
17.4 12.9 18.3 21.3*
HighAndSomeLowNot -36.8 -37.3 -47.8
-47.8 -40.2 -42.2 -41.5
HighAndSomeLowOr -45.1 -46.4 -47.9
-46.2 -38.7 -39.7 -44.9
HighAndTonsHighNot 162.4* 145.1 149.1
130.1 142.9 144.7 143.7
HighAndTonsHighOr 154.8* 146.5 154.0
137.8 144.9 150.0 149.1
HighAndTonsLowNot -27.0 -17.4 -73.7
-49.6 -40.1 -28.6 -15.6
HighAndTonsLowOr -28.7 -14.3 -63.8
-44.8 -33.0 -24.4 -13.9
LowAndSomeHighNot 3.0 0.2 4.5
6.2* 5.7 6.2* 4.7
LowAndSomeHighOr 5.3 1.4 6.8*
6.7 7.7 5.8 6.6
LowAndSomeLowNot -6.3 -24.4 3.7*
0.8 1.7 -2.3 -4.0
LowAndSomeLowOr -10.3 -22.7 2.2*
2.0 1.7 -2.3 -8.8
LowAndTonsHighNot 27.3* 21.4 22.5
21.5 21.0 23.8 26.5
LowAndTonsHighOr 23.1 28.2 24.2
23.9 29.1* 27.5 28.2
LowAndTonsLowNot 33.0 46.5 39.1
33.4 30.0 47.2* 44.3
LowAndTonsLowOr 45.7* 34.6 29.9
36.8 45.3 40.9 38.1
{code}
size7 means the bucketTable's size is 1 << 7.
It seems that we can get a better result on \*SOME\* tasks if we combine size9
with size5.
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf,
> luceneutil-score-equal.patch, luceneutil-score-equal.patch, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]