[jira] [Reopened] (LUCENE-6198) two phase intersection

Adrien Grand (JIRA) Wed, 11 Feb 2015 06:46:34 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand reopened LUCENE-6198:
----------------------------------
    Lucene Fields:   (was: New)

I'll try to summarize API challenges that have been mentioned or that I can 
think of:

 - should match confirmation be built-in DocIdSetIterator (ie. adding a 
matches() method and requiring callers to always verify matches)? While it 
would work, one issue I have is that it would also make the simple cases such 
as TermScorer more complicated? So I like having an optional method or marker 
interface better.

 - ideally this would not be intrusive and just an incremental improvement over 
what we currently have today

 - this thing cannot be a marker interface, otherwise wrappers like 
ConstantScoreQuery could not work properly

 - we need to somehow reuse the DocIdSetIterator abstraction for code reuse 
(approximations cannot be a totally different object)

 - one concern was that it should work well for queries and filters, but since 
we are slowly merging both, it would probably ok to make it work for queries 
only (which potentially means that we could expose methods only on Scorer 
instead of DISI, at least as a start).

 - should we extend DocIdSetIterator and add a 'matches' method, or have 
another class that exposes a DocIdSetIterator 'approximation' and a 'matches' 
method. While the patch on LUCENE-6198 uses option 1, I like the fact that with 
option 2 we do not extend DocIdSetIterator and more clearly separate the 
approximation from the confirmation (like the API proposal on SOLR-7044)

 - in a conjunction disi, should there be a way to configure the order in which 
confirmations should be performed (kind-of similarly to the cost API, by trying 
to confirm the cheapest instances first)? I think so but I we can probably 
delay this problem to later?

Here is a new patch which is very similar to the current one, but with two main 
differences:
 - the approximation DISI has been replaced with a TwoPhaseDocIdSetIterator 
class which exposes an iterator called 'approximation' and a 'boolean 
matches()' method
 - approximation is only exposed on Scorer

> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if 
> a document is a match. The simplest example is a phrase scorer, but there are 
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches 
> all odd documents, another that is a phrase matching all even documents. 
> Today this conjunction will be very expensive, because the zig-zag 
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like 
> a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Reopened] (LUCENE-6198) two phase intersection

Reply via email to