Re: Improving sort performance

Jeff Rodenburg Sat, 22 Oct 2005 15:42:09 -0700

Very cool. Any known drawbacks to this approach?


On 10/22/05, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> FunctionQuery matches all documents, so you normally want to use it as
> part
> of a BooleanQuery with another mandatory clause. That will cause only
> documents matching the other clause to be scored (the BooleanScorer takes
> care of that logic).
>
> The score FunctionQuery produces is from the function alone (no relevancy
> stuff like idf, tf, lengthNorm, or anything else).
>
> If you want to sort by that score alone, then boost the other parts of the
> query to 0.
>
> So, (MyQuery, sorted by MyFunkySort), becomes
> ((+MyQuery^0 MyFunctionQuery), sorted by score)
>
> -Yonik
> Now hiring -- http://forms.cnet.com/slink?231706
>
> On 10/22/05, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> >
> > This is really interesting, I haven't revved our code to this version
> yet.
> > Does the score returned by FunctionQuery supersede underlying relevance
> > scoring or is it rolled in at some base class?
> >
> > -- j
> >
> > On 10/22/05, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > >
> > > I'm not sure what type of score you are trying to do, but maybe
> > > FunctionQuery would help.
> > > http://issues.apache.org/jira/browse/LUCENE-446
> > >
> > > -Yonik
> > > Now hiring -- http://forms.cnet.com/slink?231706
> > >
> > > On 10/22/05, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> > > >
> > > > I have a custom sort that completes calculations on-the-fly, similar
> > to
> > > > the
> > > > LIA distance sort. SortField type is Float. It works, but I need
> > better
> > > > performance. I'm wondering if there's a better way to do this.
> > > >
> > > > As a rule, the number of results returned in a given search will
> most
> > > > often
> > > > be a fraction of the total documents in the search indexes. For
> > example,
> > > > 1000 results would be a rather large result set for what I'm
> > expecting.
> > > > The
> > > > aggregate index document count is in the range of 20 million.
> > > >
> > > > The standard process of looping through the TermDocs from readers
> for
> > > the
> > > > aggregate index seems wasteful in this scenario, given the relative
> > > number
> > > > of results to the overall size of the index. What are my options
> here?
> > > >
> > > > Thanks
> > > > jeff
> > > >
> > > >
> > >
> > >
> >
> >
>
>

Re: Improving sort performance

Reply via email to