[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Wed, 04 Jun 2014 19:05:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018417#comment-14018417
 ]


Da Huang commented on LUCENE-4396:
----------------------------------

About scores diff. on BS/BS2 (the same as BNS/BS2)

Now, there's scores diff. on BS/BS2, when excuting query like "+a b c d ...".

I have been told that the reason is indicate by 
the TODO on ReqOptSumScorer.score() which says that
{code}
// TODO: sum into a double and cast to float if we ever send required clauses 
to BS1
{code}

However, I don't think so, as the score bias is due to
different score calculating orders.

Supposed that a doc hits the query "+a b c d", the score calculated by BS is 
{code}
BS.score(doc) = ((a.score() + b.score()) + c.score()) + d.score()
{code}
while the score calculated by BS2 is 
{code}
BS2.score(doc) = a.score() + (float)(b.score() + c.score() + d.score())
{code}

Notice that, in BS2, we can only get the float value of (b.score() + c.score() 
+ d.score())
by reqScorer.score().

Furthermore, I have noticed that actually we can control the BS's 
score calulating order, so that 
{code}
BS.score(doc) = a.score() + ((b.score() + c.score()) + d.score())
{code}
However, for BS2, we do not know the calculating order of 
(b.score() + c.score() + d.score()), as the order is determined by 
scorer's position in a heap. I still think this matters little.

I will rearrange the calculating order of BS.score() at next patch, 
to see whether it works.


> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to