Op woensdag 09 juni 2010 14:40:49 schreef Shai Erera: > So just to make sure I understand: > > A Matcher is paired w/ a Scorer, and this pairing is done at Query > construction time ... e.g. if I use QP to construct the Query, I'd need to > extend QP by providing my custom scorer for relevant Matchers (and reuse the > scorers logic for the other fragments), and if I programmatically create a > Query, I'll need to pair its Matcher w/ a Scorer. Is that what you meant? > > How is that different from today's API? At a high level, someone can extend > BQ and override createScorer .. if Scorer was just the Scorer and BQ had a > Matcher ...
Have a look at LUCENE-1345 ... Regards, Paul Elschot > > BTW, re the note on BM25BQ -- do you think a BM25 Scorer can fit all query > types? I.e. would you reuse the same instance code for > Boolean/Term/Phrase/SpanQuery, or would you not need to write a proper BM25 > scoring algorithm depending on the Query type? I'm asking this assuming we > have a Matcher and Scorer decoupling. > > If you can indeed have one BM25 scoring algorithm that fits all Query types, > which means it's quite agnostic to the Query executed, and only cares about > the doc id, and maybe some independent data it can fetch about it from > elsewhere, then I agree that the current API is not nicely extensible. But > if not, then I don't see how would the Matcher/Scorer change improve that. > > Perhaps we should describe 2-3 queries, the result query trees and how they > are evaluated today vs. the Matcher/Scorer approach? It's always easier to > talk about something when you have an example :) > > Shai > > On Wed, Jun 9, 2010 at 3:16 PM, Earwin Burrfoot <[email protected]> wrote: > > > What I have in mind is basically having two parallel trees - one for > > matching, one for scoring. > > Matching tree is completely independent and can be used as a filter > > with sort-by-field approach, for example. > > Scoring tree nodes have references to corresponding matching tree > > nodes, so they can exploit their "current state". > > > > Both trees are built with a visitor over some AST produced from > > textual query, or programmatically. > > So what you have to do is to write said visitors. Some of the basic > > scorers can be reused by your custom visitor, so voila - we have nice > > extensibility by composition, instead of extensibility by inheritance > > (which sucks). Also, all this custom code is gathered in a single > > class, instead of being spread over your query derivatives. > > This is not a final design, lots of things can differ. I.e. - trees > > don't have to be parallel. If we want some query branch to not affect > > the score, but do matching, we're currently wrapping it in > > ConstantScoreQuery, in my design the matcher tree will look as is, but > > corresponding scorer tree branch will be replaced by ConstantScore. > > > > 2010/6/9 Shai Erera <[email protected]>: > > > I don't feel comfortable with the statement "these visitors are then free > > to > > > specialize on matchers or not ...". Let's think how this API will be used > > .. > > > today, the user has two hooks - the QueryParser and Collector. Collector > > > allows you to plug in your own and by extending QP you can return your > > own > > > Query for different fragments. > > > > > > The Query is a full set though - Query + Weight + Scorer. Whether you > > extend > > > an existing query and just override one of the methods is up to you, but > > > still the Query is self contained. > > > > > > If we break the Query API down to a Matcher and Scorer, how will you > > provide > > > your own Scorer? Collector is independent of the Query - it just collects > > > the results. Will the Scorer be independent of Query too (and become an > > > IndexSearcher.search() argument)? I don't think so, 'cause you want to > > know > > > which Matcher you're up against in order to write a good Scorer. There's > > no > > > point passing in a PhraseScorer if the query does not include any > > > PhraseMatcher. So will you need to extend Query, to return your own > > custom > > > Scorer, for certain fragments? Can't you do it today already (given the > > API > > > is not final, is public/protected etc.) > > > > > > Earwin - is that what you had in mind? If so, let's think first if the > > > current API is not sufficient, given that we 'open' it for extension ... > > > e.g., can someone achieve that by extending PhraseQuery, override > > > createScorer and return his own? Do we need more than that? > > > > > > I'm not saying we should refactor the API to Matcher + Scorer, just > > thinking > > > on what do we really need to do and what's the best way to achieve that. > > > > > > Shai > > > > > > On Wed, Jun 9, 2010 at 2:24 PM, Earwin Burrfoot <[email protected]> > > wrote: > > >> > > >> > Can we represent the Query > > >> > state in some general structure, that no matter which Query you get, > > >> > you'll > > >> > know how to score it? > > >> > > >> No. You could go for unified interface that allows you to express > > >> different query states, like a set of untyped key-values, but you'll > > >> end up switching on these keyvalues in the end. > > >> > > >> It's better to define a set of matchers, and then produce visitors > > >> that compute scores. These visitors are then free to specialize on > > >> matchers or not, or ignore the whole tree completely. > > >> > > >> -- > > >> Kirill Zakharenko/Кирилл Захаренко ([email protected]) > > >> Phone: +7 (495) 683-567-4 > > >> ICQ: 104465785 > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: [email protected] > > >> For additional commands, e-mail: [email protected] > > >> > > > > > > > > > > > > > > -- > > Kirill Zakharenko/Кирилл Захаренко ([email protected]) > > Phone: +7 (495) 683-567-4 > > ICQ: 104465785 > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
