All, I have a several questions regarding query response time and I would appreciate any help that can be provided.
We have a system that indexes approximately 200,000 documents per day at a fairly constant rate and holds them in a cfs-style file system directory index for 8 days. The index is approximately 50 GBs when optimized - which we do semi-monthly. We are running lucene 2.3.2 with jre 1.6. 0_10 on Centos5 on 64-bit Dell 2950s - 3GHz dual/quad core processors with local ext3 Raid-5 15k disks (approximately 1.7TBs) The box has 16GB and the JVM is allocated 11G (both Xms and Xmx) Every 15 minutes, we flush the IndexWriter and create a new IndexSearcher to expose the newly indexed content. Every hour, approximately 1 hours worth of content (approximately 8,000 documents) is deleted, we flush the IndexWriter, and create a new IndexSearcher. Q1: Given these settings, are there general rules of thumb for setting the MergeFactor, MaxMergeDocs, MaxBufferedDocs, and RAMBufferSizeMB? We do a series of warm up searches every time we create a new IndexSearcher. Right now we are directly calling the IndexSearcher.search() method with a query, null filter, and 10 documents to return. We run searches against all of the index fields. Q1: Are there any rules of thumb for the number or complexity of warm up searches? Q2: Is it important to "warmup" the query parser, analyzer, etc or the ranges we use in queries or the sorting? When the system is receiving regular queries, between 1 and 5 per second for example, the search response times are extremely fast (sub 500ms) and mostly independent of query complexity. We see slower query responses (on the order of 2-4 seconds) for the first few queries when using a newly created IndexSearcher. However, the extremely fast response times return quickly and continue. When the system has not received any search requests for a period of time, as little as 5 seconds, the query response time for even a simple query starts climbing (5 -8 seconds) and the longer the idle period between queries, the longer the query response time (growing to 15-30 seconds if the idle time is 30seconds to a minute). NOTE: the system is still indexing new content and removing old content when there are no incoming queries. Q3: Is there a known issue where the IndexSearcher cache empties over time? Finally, there are times when the query response times completely go off the charts - to 100s of seconds. Q4: Is it possible that this is due to segments being merged together? If so, besides the MergeFactor, etc. settings are there ways to mitigate this? Thanks in advance for any help you can provide. Regards, Dan Dan O'Connor SVP, Engineering Acquire Media<http://www.acquiremedia.com/> 77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18> Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18> e: docon...@acquiremedia.com<mailto:docon...@acquiremedia.com> o: 781-250-0565 f: 877-861-7724