[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Da Huang updated LUCENE-4396:
-----------------------------
Attachment: tasks.cpp
LUCENE-4396.patch
And.tasks
The patch based on git mirror commit 67d17eb81b754fa242bb91e1b91070fd8b38ecd9 .
In this patch, I remove those unused classes, encapsulate some functions and
fix some bugs.
Besides, the tasks file used before has heavy relevance between cases.
I think it's not good. Therefore, I generate a new tasks file.
The file And.tasks is the new tasks file, while 'tasks.cpp' is the program to
generate this tasks file.
You can generate tasks file by running
{code}
g++ tasks.cpp -std=c++0x -o tasks
./tasks < wikimedium.10M.nostopwords.tasks > And.tasks
{code}
The perf. on the new tasks file is as follows.
{code}
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighAnd5LowNot 5.40 (5.1%) 4.88 (4.2%)
-9.6% ( -18% - 0%)
HighAnd5LowOr 7.05 (10.2%) 6.87 (3.8%)
-2.6% ( -15% - 12%)
LowAnd5LowNot 27.17 (2.1%) 26.47 (2.6%)
-2.6% ( -7% - 2%)
HighAnd5HighOr 1.13 (3.8%) 1.11 (2.2%)
-1.8% ( -7% - 4%)
LowAnd5LowOr 31.82 (2.6%) 31.35 (2.3%)
-1.5% ( -6% - 3%)
PKLookup 98.80 (5.2%) 102.02 (6.3%)
3.3% ( -7% - 15%)
HighAnd5HighNot 1.95 (1.0%) 2.04 (2.1%)
4.7% ( 1% - 7%)
LowAnd5HighNot 9.46 (2.9%) 10.32 (2.7%)
9.0% ( 3% - 15%)
LowAnd5HighOr 7.56 (2.8%) 8.42 (2.8%)
11.4% ( 5% - 17%)
LowAnd60HighOr 0.51 (2.5%) 0.82 (4.8%)
58.7% ( 50% - 67%)
LowAnd60LowNot 2.61 (1.0%) 4.64 (3.4%)
78.0% ( 72% - 83%)
HighAnd60LowNot 1.30 (1.2%) 2.36 (3.7%)
81.1% ( 75% - 87%)
HighAnd60LowOr 1.18 (1.3%) 2.15 (3.7%)
82.0% ( 76% - 88%)
LowAnd60LowOr 2.25 (0.6%) 4.61 (4.2%)
104.7% ( 99% - 110%)
HighAnd60HighOr 0.10 (0.7%) 0.26 (4.8%)
151.2% ( 144% - 157%)
LowAnd60HighNot 0.53 (2.5%) 1.62 (8.0%)
204.0% ( 188% - 220%)
HighAnd60HighNot 0.14 (0.9%) 0.59 (8.9%)
328.4% ( 315% - 341%)
{code}
My next step is to do more tests to get better rules and make sure the
correctness. I think it can be finished by this Friday.
As the suggested pencil down date is comming, I will begin to scrub the code,
improve the comments, and write document in conclusion.
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch,
> luceneutil-score-equal.patch, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]