On Friday 09 January 2009 05:29:15 John Wang wrote: > Makes sense. > I didn't think 32 was the empirically determined magic number ;)
That number does have a history, but I don't know the details. > Are you planning to do a patch for this? No, but could you open an issue and mention the performance improvements? Regards, Paul Elschot > > -John > > On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot <paul.elsc...@xs4all.nl> wrote: > > > John, > > > > Continuing, see below. > > > > On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote: > > > On Wednesday 07 January 2009 07:25:17 John Wang wrote: > > > > Hi: > > > > > > > > The default buffer size (for docid,score etc) is 32 in TermScorer. > > > > > > > > We have a large index with some terms to have very dense doc sets. > > By > > > > increasing the buffer size we see very dramatic performance > > improvements. > > > > > > > > With our index (may not be typical), here are some numbers with > > buffer > > > > size w.r.t. performance in our query (a large OR query): > > > > > > > > Buffer-size improvement > > > > 2042 - 22.0 % > > > > 4084 - 39.1 % > > > > 8172 - 51.1 % > > > > > > > > I understand this may not be suitable for every application, so do > > you > > > > think it makes sense to make this buffer size configurable? > > > > > > > > > > Ideally the TermScorer buffer size could be set to a size depending on > > > the query structure, but there is no facility for this yet. > > > For OR queries larger buffers help, but not for AND queries. > > > See also LUCENE-430 on reducing buffer sizes for the underlying > > > TermDocs for very sparse doc sets. > > > > It may be possible to change the TermScorer buffer size dynamically. > > For OR queries TermScorer.next() is used, and for AND queries > > TermScorer.skipTo() is used. > > That means that when the buffer runs out during TermScorer.next(), > > it could be enlarged, for example by doubling (or quadrupling) the size > > to a configurable maximum of 8K or even 16K, see above. When > > TermScorer.skipTo() runs out of the buffer it could leave the buffer > > size unchanged. > > > > This involves some memory allocation during search. > > That is unusual, but it could be worthwhile given the > > performance improvement. > > > > Regards, > > Paul Elschot > > >