Makes sense. I didn't think 32 was the empirically determined magic number ;)
Are you planning to do a patch for this? -John On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot <paul.elsc...@xs4all.nl> wrote: > John, > > Continuing, see below. > > On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote: > > On Wednesday 07 January 2009 07:25:17 John Wang wrote: > > > Hi: > > > > > > The default buffer size (for docid,score etc) is 32 in TermScorer. > > > > > > We have a large index with some terms to have very dense doc sets. > By > > > increasing the buffer size we see very dramatic performance > improvements. > > > > > > With our index (may not be typical), here are some numbers with > buffer > > > size w.r.t. performance in our query (a large OR query): > > > > > > Buffer-size improvement > > > 2042 - 22.0 % > > > 4084 - 39.1 % > > > 8172 - 51.1 % > > > > > > I understand this may not be suitable for every application, so do > you > > > think it makes sense to make this buffer size configurable? > > > > > > > Ideally the TermScorer buffer size could be set to a size depending on > > the query structure, but there is no facility for this yet. > > For OR queries larger buffers help, but not for AND queries. > > See also LUCENE-430 on reducing buffer sizes for the underlying > > TermDocs for very sparse doc sets. > > It may be possible to change the TermScorer buffer size dynamically. > For OR queries TermScorer.next() is used, and for AND queries > TermScorer.skipTo() is used. > That means that when the buffer runs out during TermScorer.next(), > it could be enlarged, for example by doubling (or quadrupling) the size > to a configurable maximum of 8K or even 16K, see above. When > TermScorer.skipTo() runs out of the buffer it could leave the buffer > size unchanged. > > This involves some memory allocation during search. > That is unusual, but it could be worthwhile given the > performance improvement. > > Regards, > Paul Elschot >