Re: Index Partitioning

Andrzej Bialecki Sun, 22 Mar 2009 22:44:16 -0700

Shashi Kant wrote:

Is there an "elegant" approach to partitioning a large Lucene index (~1TB)
into smaller sub-indexes other than the obvious method of re-indexing into
partitions?
Any ideas?


Try the following:

* open your index, and mark all documents as deleted except 1/Nth thatshould fill the first shard. Close the index, BUT DO NOT OPTIMIZE IT!

* create IndexWriter, and use addIndexes to add the original index. Onlynon-deleted docs will be copied.


* open the original index and use undeleteAll() to revert the deletions.

* mark the next 1/Nth documents as deleted
...
* repeat the cycle as many times as needed

A more elegant version of this algorithm can be implemented usingFilterIndexReader.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index Partitioning

Reply via email to