Seeking advice on index parameter settings for large index

Chuck Williams Tue, 29 Mar 2005 21:14:19 -0800

I'm preparing to help a company run a scalability test and decide whether or not to use Lucene. Relevant particulars for the test include: 1. 2 pairs of indices. Each pair has 1 index with about 7.5 million small documents and 1 index with about 1 million large documents. Each index has a substantial number of (small) fields in addition to the documents. 2. Searching will done using a node for each index pair -- i.e., the test will use a MultiSearcher accessing the remote indices. 3. Indexing and searching will be done simultaneously -- indexing will be incremental and continual. There are no deletes. 4. The platform is Windows 5. Both search and indexing time are essential, and so need to be balanced.

Based on some early measurements with small test sets, but mostly first principles, I'm thinking of using these settings. The index will take a long time to create and I probably get only one chance to prove what Lucene can do, and so I'd appreciate any good advice or experience that would suggest different settings:

index.setMaxBufferedDocs(10); // Buffer 10 documents at a time in memory (they could be big) index.setMaxFieldLength(Integer.MAX_VALUE); // We do the limiting ourselves by what we pass in index.setMaxMergeDocs(100000); // Yields about 75 large segments for 7.5 million docs (plus log2 smaller segments) = 100 total index.setMergeFactor(2); // Faster searches due to fewer (small) segements, but slower indexing due to more frequent merging index.setSimilarity(similarity); index.setTermIndexInterval(128); // Default. Larger nubmer will reduce memory at cost of slower term access index.setUseCompoundFile(true); // false could improve performance but will consume more file handles

Thanks for any suggestions!

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Seeking advice on index parameter settings for large index

Reply via email to