Makes sense.
I didn't think 32 was the empirically determined magic number ;)

Are you planning to do a patch for this?

-John

On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot <paul.elsc...@xs4all.nl> wrote:

> John,
>
> Continuing, see below.
>
> On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote:
> > On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> > > Hi:
> > >
> > >    The default buffer size (for docid,score etc) is 32 in TermScorer.
> > >
> > >     We have a large index with some terms to have very dense doc sets.
> By
> > > increasing the buffer size we see very dramatic performance
> improvements.
> > >
> > >     With our index (may not be typical), here are some numbers with
> buffer
> > > size w.r.t. performance in our query (a large OR query):
> > >
> > >     Buffer-size  improvement
> > > 2042 -       22.0 %
> > > 4084 -       39.1 %
> > > 8172 -       51.1 %
> > >
> > >     I understand this may not be suitable for every application, so do
> you
> > > think it makes sense to make this buffer size configurable?
> > >
> >
> > Ideally the TermScorer buffer size could be set to a size depending on
> > the query structure, but there is no facility for this yet.
> > For OR queries larger buffers help, but not for AND queries.
> > See also LUCENE-430 on reducing buffer sizes for the underlying
> > TermDocs for very sparse doc sets.
>
> It may be possible to change the TermScorer buffer size dynamically.
> For OR queries TermScorer.next() is used, and for AND queries
> TermScorer.skipTo() is used.
> That means that when the buffer runs out during TermScorer.next(),
> it could be enlarged, for example by doubling (or quadrupling) the size
> to a configurable maximum of 8K or even 16K, see above. When
> TermScorer.skipTo() runs out of the buffer it could leave the buffer
> size unchanged.
>
> This involves some memory allocation during search.
> That is unusual, but it could be worthwhile given the
> performance improvement.
>
> Regards,
> Paul Elschot
>

Reply via email to