On Wed, Apr 16, 2014 at 8:39 PM, Andrew Cooper <andrew.coo...@nisc.coop>wrote:
> It is becoming more and more evident that the most reliable option at > this point would be to do an out-of-band rsync of a snapshot on dc1, with a > custom sstable id de-duplication script paired with a > refresh/compaction/cleanup on dc2 nodes as in [1]. > That feel when someone refers you to a blog post you wrote... on a third party site with no attribution... http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra Is the original, FWIW.... Our biggest concern at this point is can we effectively rebuild a failed > node with streaming/bootstrap or do we need to devise custom workflows > (like above mentioned rsync) to quickly and reliably bring a node back to > full load. > If you go down this route, you might wish to consider something like tablesnap to create backup sets off-node. Tablesnap is AWS/S3 centric, but presumably could be modified to allow backup/restore via another method. https://github.com/synack/tablesnap However, unless your nodes have tons of data, it should be possible to convince streaming to work most of the time. Is your failure repeatable? =Rob