Hello, I am currently evaluating Lucene 3.5.0 for upgrading from 3.0.3, and in the context of my usage, the most important parameter is index writing throughput. To that end, I have been running various tests, but seeing some contradictory results from different setups, which hopefully someone with a better knowledge of Lucene's internals could explain...
First, let me describe my usage of Lucene, which is common across all of these cases. 1. Terms: non-analyzed strings or integral types, mostly. No free form text values on fields. 2. All indexed fields are stored. 3. Multiple threads per index writer, in the overall application currently capped at 4. 4. Document deletes are performed with each index update, using a simple string term to identify the document. 5. Default IndexWriter config settings are used, i.e. directory type, merge policy, RAM buffer size, etc. 6. Typical data size for an index is anywhere from a few hundred K docs up to a few hundred M. 7. Hardware config: - kernel 2.6.16-60 SMP (SuSE Enterprise Server 10) - 16x CPU - 16G RAM - ReiserFS partition for index data (more on this below) Here is where things diverge though. The first use case is a standalone performance test, which writes 1M documents containing 4 fields (2 string, 2 numeric) to a single index using 10 worker threads. In this case, I do not see any writing performance degradation when going from 3.0.3 to 3.5. The second setup is a distributed multi-threaded client server application, where Lucene is used on the server to implement the search functionality. Clients have the ability to submit searchable data for indexing, as well as to run queries against the data. I realize this is a very generic description, and if needed could provide more specifics later. For now, let's say the second test runs on one such client, and submits 3 million records for the server to process (and also index via Lucene). Total time taken is then reported. But when running the test above, I can definitely observe a consistent increase in test times when the only thing changing is Lucene going from 3.0.3 to 3.5.0, on the order of 15-35%. How could I reconcile this discrepancy? My theory at this point is that the combination of the kernel above and ReiserFS (default FS for the distro) somehow making index writing in 3.5.0 slower, possibly due to the BKL issue, but only when used in a heavily multi-threaded system. Unfortunately, I currently have no ext3 partitions, or ability to upgrade the kernel on the system to prove or disprove this. Has anyone experienced issues like this in a similar setup, or maybe benchmarked Lucene across different file system types and release versions? Thanks, -V --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org