[
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand reopened LUCENE-6198:
----------------------------------
Lucene Fields: (was: New)
I'll try to summarize API challenges that have been mentioned or that I can
think of:
- should match confirmation be built-in DocIdSetIterator (ie. adding a
matches() method and requiring callers to always verify matches)? While it
would work, one issue I have is that it would also make the simple cases such
as TermScorer more complicated? So I like having an optional method or marker
interface better.
- ideally this would not be intrusive and just an incremental improvement over
what we currently have today
- this thing cannot be a marker interface, otherwise wrappers like
ConstantScoreQuery could not work properly
- we need to somehow reuse the DocIdSetIterator abstraction for code reuse
(approximations cannot be a totally different object)
- one concern was that it should work well for queries and filters, but since
we are slowly merging both, it would probably ok to make it work for queries
only (which potentially means that we could expose methods only on Scorer
instead of DISI, at least as a start).
- should we extend DocIdSetIterator and add a 'matches' method, or have
another class that exposes a DocIdSetIterator 'approximation' and a 'matches'
method. While the patch on LUCENE-6198 uses option 1, I like the fact that with
option 2 we do not extend DocIdSetIterator and more clearly separate the
approximation from the confirmation (like the API proposal on SOLR-7044)
- in a conjunction disi, should there be a way to configure the order in which
confirmations should be performed (kind-of similarly to the cost API, by trying
to confirm the cheapest instances first)? I think so but I we can probably
delay this problem to later?
Here is a new patch which is very similar to the current one, but with two main
differences:
- the approximation DISI has been replaced with a TwoPhaseDocIdSetIterator
class which exposes an iterator called 'approximation' and a 'boolean
matches()' method
- approximation is only exposed on Scorer
> two phase intersection
> ----------------------
>
> Key: LUCENE-6198
> URL: https://issues.apache.org/jira/browse/LUCENE-6198
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if
> a document is a match. The simplest example is a phrase scorer, but there are
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches
> all odd documents, another that is a phrase matching all even documents.
> Today this conjunction will be very expensive, because the zig-zag
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like
> a conjunction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]