Shashi Kant wrote:
Is there an "elegant" approach to partitioning a large Lucene index (~1TB)
into smaller sub-indexes other than the obvious method of re-indexing into
partitions?
Any ideas?
Try the following:
* open your index, and mark all documents as deleted except 1/Nth that
should fill the first shard. Close the index, BUT DO NOT OPTIMIZE IT!
* create IndexWriter, and use addIndexes to add the original index. Only
non-deleted docs will be copied.
* open the original index and use undeleteAll() to revert the deletions.
* mark the next 1/Nth documents as deleted
...
* repeat the cycle as many times as needed
A more elegant version of this algorithm can be implemented using
FilterIndexReader.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org