> sleep 10 seconds

If you are trying to prevent writes being lost in the typical case,
you need to wait *first* for all nodes to understand that the node is
down (as Chris mentioned). At that point, no node should be sending it
new writes. Assuming you also disabled the thrift interface, no writes
will be submitted locally either.

THEN, after that point, you have to wait for the 10 seconds to be
reasonably sure the commit log has been flushed. I'd wait more, say 12
seconds, just to have some margin in case the node is overloaded.

Still, just be aware there's no *guarantee*. E.g., if the commit log
writer is stalled due to disk saturation, data older than 15 seconds
may still be un-flushed at the point of kill. The way to get a
"guarantee" is to use batchwise sync.

If you're doing QUORUM you probably don't care, but if you're doing
ONE and want an actual *guarantee* that writes aren't lost, you might.
But on the other hand if you want a guarantee you shouldn't be using
ONE anyway since a node can crash at any time.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Reply via email to