Re: BufferedIndexInput.readByte performance

Ken Krugler Fri, 26 May 2006 10:22:35 -0700

On Friday 26 May 2006 16:14, Michael Chan wrote:

Hi,

 > I have a 5gb index containing 2mil documents and am trying to run

 1mil+ queries against it. Most of the queries are SpanQueries and it
 occurs to me that the search performance is quite slow when using 2, 3
 SpanOrQueries nested inside a SpanNearQuery, which in turn is nested
 inside another SpanNearQuery. The response time is around 3-5 seconds
 even when the index is stored as a RAMDirectory, and
 BufferedIndexInput.readByte() appears to be the bottleneck. Is this

 > performance typical? As I don't need any sorting of the results and

That is indeed typical.

 > only need the number of results returned, is there anything, besides

 Field.setOmitNorm(true), I can modify to improve performance?


A few things might help:
- use getSpans() on the scorer of the query, iterate the resulting Spans
  and count the number of different doc values.
  This saves the scoring and the sorting on score value.
- Sort the queries alphabetically, to try and maximize cache usage.
- Increase the skip interval when creating the index, by default lucene uses
  16, but nutch uses a higher value. I've never done this myself, but
  you could specifically ask on how to do this.

I'm curious how increasing the skip interval would improveperformance. From what I've read on-list and in-code, having asmaller skip interval means trading more memory for faster termlookup. I think Nutch uses a larger default value (128) to betterhandle big (e.g. 10M) indexes, at the expense of slightly slowerperformance.

Also, the use of a sorted index would seem to offer the biggestpotential speed win, though that would require a good way of orderingthe index, and it seems like there would be lots of tuning requiredto pick the right cut-off value for searches.


Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: BufferedIndexInput.readByte performance

Reply via email to