What do you mean "we are not inlining"? The compiler inlines methods .. at least it tries.
Shai On Tue, Jun 8, 2010 at 8:21 PM, John Wang <[email protected]> wrote: > Shai: > > method call overhead in this case is not insignificant because it is in > a very tight loop, and no, compiler cannot optimize it for you, we are not > inline-ing cuz we are in a java world. > > You are right, this breaks backward compatibility. But from 2.4 - 2.9, > we have done MUCH worse. :) > > -John > > > On Tue, Jun 8, 2010 at 10:09 AM, Shai Erera <[email protected]> wrote: > >> I guess I must be missing something fundamental here :). >> >> If Scorer is defined as you propose, and I create my Scorer which impls >> getDISI() as "return this" - what do I lose? What's wrong w/ Scorer already >> being a DISI? >> >> You mention "it is just inefficient to pay the method call overhead ..." - >> what overhead? Are you talking about the decorator delegating the call to >> the wrapped scorer? I really think the compiler can handle that, no? >> Especially if you make your nextDoc/advance final (which probably you >> should) ... >> >> That doesn't seem to justify an API change, break bw completely (even if >> we do it in 4.0 only) and change all the current Scorers ... >> >> Shai >> >> >> On Tue, Jun 8, 2010 at 8:01 PM, John Wang <[email protected]> wrote: >> >>> re: But Scorer is itself an iterator, so what prevents you from calling >>> >>> nextDoc and advance on it without score() >>> >>> Nothing. It is just inefficient to pay the method call overhead just to >>> overload score. >>> >>> re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ >>> >>> is not good enough I'd just develop my own. >>> >>> That is what I am doing. I am just proposing the change (see my first >>> email) as an improvement. >>> >>> re: Scorer is itself an iterator >>> >>> yes, that is the current definition. The point of the proposal is to make >>> this change. >>> >>> -John >>> >>> On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera <[email protected]> wrote: >>> >>>> Well … I don't know the reason as well and always thought Scorer and >>>> Similarity are confusing. >>>> >>>> But Scorer is itself an iterator, so what prevents you from calling >>>> nextDoc and advance on it without score(). And what would the returned >>>> DISI do when nextDoc is called, if not delegate to its subs? >>>> >>>> If I were in your shoes, I'd simply provider a Query wrapper. If CSQ >>>> is not good enough I'd just develop my own. >>>> >>>> But perhaps others think differently? >>>> >>>> Shai >>>> >>>> On Tuesday, June 8, 2010, John Wang <[email protected]> wrote: >>>> > Hi Shai: >>>> > I am not sure I understand how changing Similarity would solve >>>> this problem, wouldn't you need the reader? >>>> > As for PayloadTermQuery, payload is not always the most efficient >>>> way of storing such data, especially when number of terms << numdocs. (I am >>>> not sure accessing the payload when you iterate is a good idea, but that is >>>> another discussion) >>>> > >>>> > Yes, what I described is exactly a simple CustomScoreQuery for a >>>> special use-case. The problem is also in CustomScoreQuery, where nextDoc >>>> and >>>> advance are calling the sub-scorers as a wrapper. This can be avoided if >>>> the >>>> Scorer returns an iterator instead. >>>> > >>>> > Separating scoring and doc iteration is a good idea anyway. I >>>> don't know the reason to combine them originally. >>>> > Thanks >>>> > -John >>>> > >>>> > >>>> > On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera <[email protected]> wrote: >>>> > >>>> > So wouldn't it make sense to add some method to Similarity? Which >>>> receives the doc Id in question maybe ... just thinking here. >>>> > >>>> > Factoring Scorer like you propose would create 3 objects for >>>> scoring/iterating: Scorer (which really becomes an iterator), Similarity >>>> and >>>> CustomScoreFunction ... >>>> > >>>> > Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how >>>> you compute your age decay function (where you pull the data about the age >>>> of the document). >>>> > >>>> > Shai >>>> > >>>> > >>>> > On Tue, Jun 8, 2010 at 6:41 PM, John Wang <[email protected]> >>>> wrote: >>>> > Hi Shai: >>>> > Similarity in many cases is not sufficient for scoring. For >>>> example, to implement age decaying of a document (very useful for corpuses >>>> like news or tweets), you want to project the raw tfidf score onto a time >>>> curve, say f(x), to do this, you'd have a custom scorer that decorates the >>>> underlying scorer from your say, boolean query: >>>> > >>>> > >>>> > >>>> > public float score(){ return myFunc(innerScorer.score());} >>>> > This is fine, but then you would have to do this as well: >>>> > public int nextDoc(){ >>>> > >>>> > >>>> > return innerScorer.nextDoc();} >>>> > and also: >>>> > public int advance(int target){ return innerScorer.advance();} >>>> The difference here is that nextDoc and advance are called far more times >>>> as >>>> score. And you are introducing an extra method call for them, which is not >>>> insignificant for queries result in large recall sets. >>>> > >>>> > >>>> > >>>> > Hope this makes sense. >>>> > Thanks >>>> > -John >>>> > On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <[email protected]> wrote: >>>> > I'm not sure I understand what you mean - Scorer is a DISI itself, and >>>> the scoring formula is mostly controlled by Similarity. >>>> > >>>> > What will be the benefits of the proposed change? >>>> > >>>> > Shai >>>> > >>>> > On Tue, Jun 8, 2010 at 8:25 AM, John Wang <[email protected]> >>>> wrote: >>>> > >>>> > >>>> > >>>> > >>>> > Hi guys: >>>> > >>>> > I'd like to make a proposal to change the Scorer class/api to the >>>> following: >>>> > >>>> > >>>> > public abstract class Scorer{ >>>> > DocIdSetIterator getDocIDSetIterator(); >>>> > float score(int docid); >>>> > } >>>> > >>>> > Reasons: >>>> > >>>> > 1) To build a Scorer from an existing Scorer (e.g. that produces raw >>>> scores from tfidf), one would decorate it, and it would introduce overhead >>>> (in function calls) around nextDoc and advance, even if you just want to >>>> augment the score method which is called much fewer times. >>>> > >>>> > 2) The current contract forces scoring on the currentDoc in the >>>> underlying iterator. So once you pass "current", you can no longer score. >>>> In >>>> one of our use-cases, it is very inconvenient. >>>> > >>>> > What do you think? I can go ahead and open an issue and work on a >>>> patch if I get some agreement. >>>> > >>>> > Thanks >>>> > >>>> > -John >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >> >
