Re: TermScorer default buffer size

Paul Elschot Thu, 08 Jan 2009 23:52:48 -0800

On Friday 09 January 2009 05:29:15 John Wang wrote:
> Makes sense.
> I didn't think 32 was the empirically determined magic number ;)


That number does have a history, but I don't know the details.
 
> Are you planning to do a patch for this?

No, but could you open an issue and mention the performance
improvements?

Regards,
Paul Elschot


> 
> -John
> 
> On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot <paul.elsc...@xs4all.nl> wrote:
> 
> > John,
> >
> > Continuing, see below.
> >
> > On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote:
> > > On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> > > > Hi:
> > > >
> > > >    The default buffer size (for docid,score etc) is 32 in TermScorer.
> > > >
> > > >     We have a large index with some terms to have very dense doc sets.
> > By
> > > > increasing the buffer size we see very dramatic performance
> > improvements.
> > > >
> > > >     With our index (may not be typical), here are some numbers with
> > buffer
> > > > size w.r.t. performance in our query (a large OR query):
> > > >
> > > >     Buffer-size  improvement
> > > > 2042 -       22.0 %
> > > > 4084 -       39.1 %
> > > > 8172 -       51.1 %
> > > >
> > > >     I understand this may not be suitable for every application, so do
> > you
> > > > think it makes sense to make this buffer size configurable?
> > > >
> > >
> > > Ideally the TermScorer buffer size could be set to a size depending on
> > > the query structure, but there is no facility for this yet.
> > > For OR queries larger buffers help, but not for AND queries.
> > > See also LUCENE-430 on reducing buffer sizes for the underlying
> > > TermDocs for very sparse doc sets.
> >
> > It may be possible to change the TermScorer buffer size dynamically.
> > For OR queries TermScorer.next() is used, and for AND queries
> > TermScorer.skipTo() is used.
> > That means that when the buffer runs out during TermScorer.next(),
> > it could be enlarged, for example by doubling (or quadrupling) the size
> > to a configurable maximum of 8K or even 16K, see above. When
> > TermScorer.skipTo() runs out of the buffer it could leave the buffer
> > size unchanged.
> >
> > This involves some memory allocation during search.
> > That is unusual, but it could be worthwhile given the
> > performance improvement.
> >
> > Regards,
> > Paul Elschot
> >
>

Re: TermScorer default buffer size

Reply via email to