On 5/24/10 3:10 PM, [email protected] wrote:
Hi all,
It seems to me that the “commit” logic in the Solr updateRequestHandler
(or wherever the logic is actually located) conflates two different
semantics. One semantic is what you need to do to make the index process
perform well. The other semantic is guaranteed atomicity of document
reception by Solr.
In particular, it would be nice to be able to post documents in such a
way that you can guarantee that the document is permanently in Solr’s
queue, safe in the event of a Solr restart, etc., even if the document
has not yet been “committed”.
This issue came up in the LCF talk that I gave, and I initially thought
that separating the two kinds of events would necessarily be an LCF
change, but the more I thought about it the more I realized that other
Solr indexing clients may also benefit from such a separation.
Does anyone agree? Where should this logic properly live?
Thanks,
Karl
Its an interesting idea - but I think you would likely pay a similar
cost to guarantee reception as you would to commit (also, I'm not sure
Lucene guarantees it - it works for consistency, but I'm not so sure it
achieves durability).
I can think of two things offhand -
Perhaps store the text and use fsync to quasi guarantee acceptance -
then index from the store on the commit.
Another simpler idea if only the separation is important and not the
performance - index to another side index, taking advantage of Lucene's
current commit functionality, and then use addIndex to merge to the main
index on commit.
Just spit balling though.
I think this would obviously need to be an optional mode.
--
- Mark
http://www.lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]