On Wed, Jan 29, 2014 at 9:45 PM, Senthil, Athinanthny X. -ND < athinanthny.x.senthil....@disney.com> wrote:
> Plan to backup and restore keyspace from PROD to PRE-PROD cluster which > has same number of nodes. Keyspace will have few hundred millions of rows. > We need to do this every other week. Which one of the below options most > time-efficient and puts less stress on target cluster ? We want to finish > backup and restore in low usage time window. > http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra Has some details on when each approach may be better or worse. In your case, you should probably just do the "copy-the-sstables" method. If the target cluster has the same number of nodes, just assign it the same tokens and then just copy SSTables from SOURCE_NODE_A to TARGET_NODE_A and so on. If you do that, you don't even have to run cleanup, because no nodes have changed their range ownership. Don't use refresh if you don't need to, just (coalesce the target cluster, load schema and then) copy the SSTables into the dir with the node down, and then start it. Refresh's current design is unsafe : https://issues.apache.org/jira/browse/CASSANDRA-6245 https://issues.apache.org/jira/browse/CASSANDRA-6514 =Rob