All,

I have a several questions regarding query response time and I would appreciate 
any help that can be provided.

We have a system that indexes approximately 200,000 documents per day at a 
fairly constant rate and holds them in a cfs-style file system directory index 
for 8 days. The index is approximately 50 GBs when optimized - which we do 
semi-monthly.

We are running lucene 2.3.2 with jre 1.6. 0_10 on Centos5 on 64-bit Dell 2950s 
- 3GHz dual/quad core processors with local ext3 Raid-5 15k disks 
(approximately 1.7TBs) The box has 16GB and the JVM is allocated 11G (both Xms 
and Xmx)

Every 15 minutes, we flush the IndexWriter and create a new IndexSearcher to 
expose the newly indexed content.

Every hour, approximately 1 hours worth of content (approximately 8,000 
documents) is deleted, we flush the IndexWriter, and create a new IndexSearcher.

Q1: Given these settings, are there general rules of thumb for setting the 
MergeFactor, MaxMergeDocs, MaxBufferedDocs, and RAMBufferSizeMB?

We do a series of warm up searches every time we create a new IndexSearcher. 
Right now we are directly calling the IndexSearcher.search() method with a 
query, null filter, and 10 documents to return. We run searches against all of 
the index fields.

Q1: Are there any rules of thumb for the number or complexity of warm up 
searches?
Q2: Is it important to "warmup" the query parser, analyzer, etc or the ranges 
we use in queries or the sorting?

When the system is receiving regular queries, between 1 and 5 per second for 
example, the search response times are extremely fast (sub 500ms) and mostly 
independent of query complexity. We see slower query responses (on the order of 
2-4 seconds) for the first few queries  when using a newly created 
IndexSearcher. However, the extremely fast response times return quickly and 
continue.

When the system has not received any search requests for a period of time, as 
little as 5 seconds, the query response time for even a simple query starts 
climbing (5 -8 seconds) and the longer the idle period between queries, the 
longer the query response time (growing to 15-30 seconds if the idle time is 
30seconds to a minute). NOTE: the system is still indexing new content and 
removing old content when there are no incoming queries.

Q3: Is there a known issue where the IndexSearcher cache empties over time?

Finally, there are times when the query response times completely go off the 
charts - to 100s of seconds.

Q4: Is it possible that this is due to segments being merged together? If so, 
besides the MergeFactor, etc. settings are there ways to mitigate this?

Thanks in advance for any help you can provide.

Regards,
Dan

Dan O'Connor
SVP, Engineering
Acquire Media<http://www.acquiremedia.com/>
77 South Bedford Street, Suite 
350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
Burlington, MA 
01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
e: docon...@acquiremedia.com<mailto:docon...@acquiremedia.com>
o: 781-250-0565
f: 877-861-7724


Reply via email to