Shashi Kant wrote:
Is there an "elegant" approach to partitioning a large Lucene index (~1TB)
into smaller sub-indexes other than the obvious method of re-indexing into
partitions?
Any ideas?

Try the following:

* open your index, and mark all documents as deleted except 1/Nth that should fill the first shard. Close the index, BUT DO NOT OPTIMIZE IT!

* create IndexWriter, and use addIndexes to add the original index. Only non-deleted docs will be copied.

* open the original index and use undeleteAll() to revert the deletions.

* mark the next 1/Nth documents as deleted
...
* repeat the cycle as many times as needed

A more elegant version of this algorithm can be implemented using FilterIndexReader.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to