Re: Proposal: Scorer api change

Paul Elschot Wed, 09 Jun 2010 08:36:03 -0700

Op woensdag 09 juni 2010 14:40:49 schreef Shai Erera:
> So just to make sure I understand:
> 
> A Matcher is paired w/ a Scorer, and this pairing is done at Query
> construction time ... e.g. if I use QP to construct the Query, I'd need to
> extend QP by providing my custom scorer for relevant Matchers (and reuse the
> scorers logic for the other fragments), and if I programmatically create a
> Query, I'll need to pair its Matcher w/ a Scorer. Is that what you meant?
> 
> How is that different from today's API? At a high level, someone can extend
> BQ and override createScorer .. if Scorer was just the Scorer and BQ had a
> Matcher ...


Have a look at LUCENE-1345 ...

Regards,
Paul Elschot


> 
> BTW, re the note on BM25BQ -- do you think a BM25 Scorer can fit all query
> types? I.e. would you reuse the same instance code for
> Boolean/Term/Phrase/SpanQuery, or would you not need to write a proper BM25
> scoring algorithm depending on the Query type? I'm asking this assuming we
> have a Matcher and Scorer decoupling.
> 
> If you can indeed have one BM25 scoring algorithm that fits all Query types,
> which means it's quite agnostic to the Query executed, and only cares about
> the doc id, and maybe some independent data it can fetch about it from
> elsewhere, then I agree that the current API is not nicely extensible. But
> if not, then I don't see how would the Matcher/Scorer change improve that.
> 
> Perhaps we should describe 2-3 queries, the result query trees and how they
> are evaluated today vs. the Matcher/Scorer approach? It's always easier to
> talk about something when you have an example :)
> 
> Shai
> 
> On Wed, Jun 9, 2010 at 3:16 PM, Earwin Burrfoot <[email protected]> wrote:
> 
> > What I have in mind is basically having two parallel trees - one for
> > matching, one for scoring.
> > Matching tree is completely independent and can be used as a filter
> > with sort-by-field approach, for example.
> > Scoring tree nodes have references to corresponding matching tree
> > nodes, so they can exploit their "current state".
> >
> > Both trees are built with a visitor over some AST produced from
> > textual query, or programmatically.
> > So what you have to do is to write said visitors. Some of the basic
> > scorers can be reused by your custom visitor, so voila - we have nice
> > extensibility by composition, instead of extensibility by inheritance
> > (which sucks). Also, all this custom code is gathered in a single
> > class, instead of being spread over your query derivatives.
> > This is not a final design, lots of things can differ. I.e. - trees
> > don't have to be parallel. If we want some query branch to not affect
> > the score, but do matching, we're currently wrapping it in
> > ConstantScoreQuery, in my design the matcher tree will look as is, but
> > corresponding scorer tree branch will be replaced by ConstantScore.
> >
> > 2010/6/9 Shai Erera <[email protected]>:
> > > I don't feel comfortable with the statement "these visitors are then free
> > to
> > > specialize on matchers or not ...". Let's think how this API will be used
> > ..
> > > today, the user has two hooks - the QueryParser and Collector. Collector
> > > allows you to plug in your own and by extending QP you can return your
> > own
> > > Query for different fragments.
> > >
> > > The Query is a full set though - Query + Weight + Scorer. Whether you
> > extend
> > > an existing query and just override one of the methods is up to you, but
> > > still the Query is self contained.
> > >
> > > If we break the Query API down to a Matcher and Scorer, how will you
> > provide
> > > your own Scorer? Collector is independent of the Query - it just collects
> > > the results. Will the Scorer be independent of Query too (and become an
> > > IndexSearcher.search() argument)? I don't think so, 'cause you want to
> > know
> > > which Matcher you're up against in order to write a good Scorer. There's
> > no
> > > point passing in a PhraseScorer if the query does not include any
> > > PhraseMatcher. So will you need to extend Query, to return your own
> > custom
> > > Scorer, for certain fragments? Can't you do it today already (given the
> > API
> > > is not final, is public/protected etc.)
> > >
> > > Earwin - is that what you had in mind? If so, let's think first if the
> > > current API is not sufficient, given that we 'open' it for extension ...
> > > e.g., can someone achieve that by extending PhraseQuery, override
> > > createScorer and return his own? Do we need more than that?
> > >
> > > I'm not saying we should refactor the API to Matcher + Scorer, just
> > thinking
> > > on what do we really need to do and what's the best way to achieve that.
> > >
> > > Shai
> > >
> > > On Wed, Jun 9, 2010 at 2:24 PM, Earwin Burrfoot <[email protected]>
> > wrote:
> > >>
> > >> > Can we represent the Query
> > >> > state in some general structure, that no matter which Query you get,
> > >> > you'll
> > >> > know how to score it?
> > >>
> > >> No. You could go for unified interface that allows you to express
> > >> different query states, like a set of untyped key-values, but you'll
> > >> end up switching on these keyvalues in the end.
> > >>
> > >> It's better to define a set of matchers, and then produce visitors
> > >> that compute scores. These visitors are then free to specialize on
> > >> matchers or not, or ignore the whole tree completely.
> > >>
> > >> --
> > >> Kirill Zakharenko/Кирилл Захаренко ([email protected])
> > >> Phone: +7 (495) 683-567-4
> > >> ICQ: 104465785
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [email protected]
> > >> For additional commands, e-mail: [email protected]
> > >>
> > >
> > >
> >
> >
> >
> > --
> > Kirill Zakharenko/Кирилл Захаренко ([email protected])
> > Phone: +7 (495) 683-567-4
> > ICQ: 104465785
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Proposal: Scorer api change

Reply via email to