Yeah I got what he meant, but I honestly don't think those delegate calls are an overhead ...
Shai On Tue, Jun 8, 2010 at 8:12 PM, Earwin Burrfoot <[email protected]> wrote: > Shai, his wrapper Scorer will just look like: > DISI getDISI() { > return delegate.getDISI(); > } > > float score(int doc) { > return calcMyAwesomeScore(doc); > } > > this saves delegate.nextDoc(), delegate.advance() indirection calls. > But I already offered a better alternative :) > > On Tue, Jun 8, 2010 at 21:09, Shai Erera <[email protected]> wrote: > > I guess I must be missing something fundamental here :). > > > > If Scorer is defined as you propose, and I create my Scorer which impls > > getDISI() as "return this" - what do I lose? What's wrong w/ Scorer > already > > being a DISI? > > > > You mention "it is just inefficient to pay the method call overhead ..." > - > > what overhead? Are you talking about the decorator delegating the call to > > the wrapped scorer? I really think the compiler can handle that, no? > > Especially if you make your nextDoc/advance final (which probably you > > should) ... > > > > That doesn't seem to justify an API change, break bw completely (even if > we > > do it in 4.0 only) and change all the current Scorers ... > > > > Shai > > > > On Tue, Jun 8, 2010 at 8:01 PM, John Wang <[email protected]> wrote: > >> > >> re: But Scorer is itself an iterator, so what prevents you from calling > >> nextDoc and advance on it without score() > >> > >> Nothing. It is just inefficient to pay the method call overhead just to > >> overload score. > >> > >> re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ > >> is not good enough I'd just develop my own. > >> > >> That is what I am doing. I am just proposing the change (see my first > >> email) as an improvement. > >> > >> re: Scorer is itself an iterator > >> > >> yes, that is the current definition. The point of the proposal is to > make > >> this change. > >> > >> -John > >> > >> On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera <[email protected]> wrote: > >>> > >>> Well ... I don't know the reason as well and always thought Scorer and > >>> Similarity are confusing. > >>> > >>> But Scorer is itself an iterator, so what prevents you from calling > >>> nextDoc and advance on it without score(). And what would the returned > >>> DISI do when nextDoc is called, if not delegate to its subs? > >>> > >>> If I were in your shoes, I'd simply provider a Query wrapper. If CSQ > >>> is not good enough I'd just develop my own. > >>> > >>> But perhaps others think differently? > >>> > >>> Shai > >>> > >>> On Tuesday, June 8, 2010, John Wang <[email protected]> wrote: > >>> > Hi Shai: > >>> > I am not sure I understand how changing Similarity would solve > this > >>> > problem, wouldn't you need the reader? > >>> > As for PayloadTermQuery, payload is not always the most efficient > >>> > way of storing such data, especially when number of terms << numdocs. > (I am > >>> > not sure accessing the payload when you iterate is a good idea, but > that is > >>> > another discussion) > >>> > > >>> > Yes, what I described is exactly a simple CustomScoreQuery for a > >>> > special use-case. The problem is also in CustomScoreQuery, where > nextDoc and > >>> > advance are calling the sub-scorers as a wrapper. This can be avoided > if the > >>> > Scorer returns an iterator instead. > >>> > > >>> > Separating scoring and doc iteration is a good idea anyway. I > don't > >>> > know the reason to combine them originally. > >>> > Thanks > >>> > -John > >>> > > >>> > > >>> > On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera <[email protected]> wrote: > >>> > > >>> > So wouldn't it make sense to add some method to Similarity? Which > >>> > receives the doc Id in question maybe ... just thinking here. > >>> > > >>> > Factoring Scorer like you propose would create 3 objects for > >>> > scoring/iterating: Scorer (which really becomes an iterator), > Similarity and > >>> > CustomScoreFunction ... > >>> > > >>> > Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how > >>> > you compute your age decay function (where you pull the data about > the age > >>> > of the document). > >>> > > >>> > Shai > >>> > > >>> > > >>> > On Tue, Jun 8, 2010 at 6:41 PM, John Wang <[email protected]> > wrote: > >>> > Hi Shai: > >>> > Similarity in many cases is not sufficient for scoring. For > >>> > example, to implement age decaying of a document (very useful for > corpuses > >>> > like news or tweets), you want to project the raw tfidf score onto a > time > >>> > curve, say f(x), to do this, you'd have a custom scorer that > decorates the > >>> > underlying scorer from your say, boolean query: > >>> > > >>> > > >>> > > >>> > public float score(){ return myFunc(innerScorer.score());} > >>> > This is fine, but then you would have to do this as well: > >>> > public int nextDoc(){ > >>> > > >>> > > >>> > return innerScorer.nextDoc();} > >>> > and also: > >>> > public int advance(int target){ return innerScorer.advance();} > >>> > The difference here is that nextDoc and advance are called far more > times as > >>> > score. And you are introducing an extra method call for them, which > is not > >>> > insignificant for queries result in large recall sets. > >>> > > >>> > > >>> > > >>> > Hope this makes sense. > >>> > Thanks > >>> > -John > >>> > On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <[email protected]> wrote: > >>> > I'm not sure I understand what you mean - Scorer is a DISI itself, > and > >>> > the scoring formula is mostly controlled by Similarity. > >>> > > >>> > What will be the benefits of the proposed change? > >>> > > >>> > Shai > >>> > > >>> > On Tue, Jun 8, 2010 at 8:25 AM, John Wang <[email protected]> > wrote: > >>> > > >>> > > >>> > > >>> > > >>> > Hi guys: > >>> > > >>> > I'd like to make a proposal to change the Scorer class/api to the > >>> > following: > >>> > > >>> > > >>> > public abstract class Scorer{ > >>> > DocIdSetIterator getDocIDSetIterator(); > >>> > float score(int docid); > >>> > } > >>> > > >>> > Reasons: > >>> > > >>> > 1) To build a Scorer from an existing Scorer (e.g. that produces raw > >>> > scores from tfidf), one would decorate it, and it would introduce > overhead > >>> > (in function calls) around nextDoc and advance, even if you just want > to > >>> > augment the score method which is called much fewer times. > >>> > > >>> > 2) The current contract forces scoring on the currentDoc in the > >>> > underlying iterator. So once you pass "current", you can no longer > score. In > >>> > one of our use-cases, it is very inconvenient. > >>> > > >>> > What do you think? I can go ahead and open an issue and work on a > patch > >>> > if I get some agreement. > >>> > > >>> > Thanks > >>> > > >>> > -John > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [email protected] > >>> For additional commands, e-mail: [email protected] > >>> > >> > > > > > > > > -- > Kirill Zakharenko/Кирилл Захаренко ([email protected]) > Phone: +7 (495) 683-567-4 > ICQ: 104465785 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
