Antony Bowesman wrote:
I have a design where I will be using multiple index shards to hold
approx 7.5 million documents per index per month over many years. These
will be large static R/O indexes but the corresponding smaller parallel
index will get many frequent changes.
I understand from previous replies by Hoss that the technique to handle
this is to use parallel indexes where the parallel index gets rebuilt
periodically with the changing data.
However, this 'periodically' needs to be quite frequent to try to
provide responsive changes to the index, potentially several times a
dat. One problem is that there can be updates to any of the data in
almost any month, so an update by a user to 120 documents, one document
per month for 10 years, requires a full rebuild of the 120 index shards
of 7.5m docs each...
I was wondering what the technical reasons were why a 'delete+add' could
not allow the original docId to be re-used, thus keeping the two
parallel indexes in sync without requiring a rebuild.
If this could be overcome, this would make this parallel index pattern
so much more useful for large volume data sets.
Any thoughts
I have a thought ;) Perhaps you could use a FilteredIndexReader to
maintain a map between new IDs and old IDs, and remap on the fly.
Although I think that some parts of Lucene depend on the fact that in a
normal index the IDs are monotonically increasing ... this would
complicate the issue.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]