Hi, This all depends on your index contents and hardware. In general the size of a single index / index segment vs multiple segments / indexes is not an issue on one single machine. To scale, you should also think of using more than one machine with e.g. ElasticSearch or Apache Solr instead of plain Lucene (which provide that functionality). For the single machine case, you can only speed up the stuff by parallelization.
> 1. What is the average acceptable size for Lucene index that is considered OK > for searching? (before it is broken down into multiple indexes) 2. Other than > performance, what should be the criteria to decide on separating the index into > mutiple index. (Criteria like single file in the index should not be more than > 2GB, or the total lucene index folder size should not be above 10GB etc) Depends. On one single machine it does not matter how big files are. When searching, an index consisting of several sub-indexes / segments behaves almost identical to one big optimized one. This is only different when you parallelize. > (Regarding code changes required to break the documents into appropriate > year) > I will be reindexing all the documents again using modified code base. For that > I will be required to > > 3. Create multiple indexWriters and index the document using appropriate > writer as per the date of the document. That's fine. The question is if that makes sense. Will the results of search queries be coming from all indexes equally distributed? If you want to parallelize, its often better to have some hash-based distribution > 4. While searching, use multiSearcher or ParallelMultiSearcher to search > across all indexes at once. MultiSearcher and ParallelMultiSearcher are deprecated and broken (and no longer supported). The correct way tosearch different indexes is to wrap all sub-Indexes by MultiReader and then use a single IndexSearcher on top of it. To parallelize, pass an ExecutorService to its ctor. Please note: IndexSearcher can only parallelize, if there are subindexes, so a big optimized index does not help here :-) Ideally you would create several separate indexes using a hash-based distribution of documents. Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org