In this case I would go on with the nodetool refresh, simply because you use the machines in a more effective way.(copy data from one node to another, each node cleans/refresh the data itself) if the clustersetup is the same with nodes/tokens there’s no need to copy all the data to one point and then stream it into another cluster(copy data many to one and then stream one to many).
mvh/regards john 30 jan 2014 kl. 06:45 skrev Senthil, Athinanthny X. -ND <athinanthny.x.senthil....@disney.com<mailto:n...@disney.com>>: Plan to backup and restore keyspace from PROD to PRE-PROD cluster which has same number of nodes. Keyspace will have few hundred millions of rows. We need to do this every other week. Which one of the below options most time-efficient and puts less stress on target cluster ? We want to finish backup and restore in low usage time window. Nodetool refresh 1. Take a snapshot from individual nodes from prod 2. Copy the sstable data and index files to pre-prod cluster (copy the snapshots to respective nodes based on token assignment) 3. Cleanup old data and 4. Run nodetool refresh on every node Sstableloader 1. Take a snapshot from individual nodes from prod 2. Copy the sstable data and index files from all nodes to 1 node in pre-prod cluster 3. Cleanup old data 4. Then run sstableloader to load data to respective keyspace/ CF. (Does sstableloader work in cluster (without vnodes ) where authentication is enabled) CQL3 COPY I tried this for CF that have <1 million rows and it works fine . But for large CF it throws rpc_timeout error Any other suggestions? AB SVENSKA SPEL 621 80 Visby Norra Hansegatan 17, Visby Växel: +4610-120 00 00 https://svenskaspel.se Please consider the environment before printing this email