In this case I would go on with the nodetool refresh, simply because you use 
the machines in a more effective way.(copy data from one node to another, each 
node cleans/refresh the data itself) if the clustersetup is the same with 
nodes/tokens there’s no need to copy all the data to one point and then stream 
it into another cluster(copy data many to one and then stream one to many).

mvh/regards
john


30 jan 2014 kl. 06:45 skrev Senthil, Athinanthny X. -ND 
<athinanthny.x.senthil....@disney.com<mailto:n...@disney.com>>:

Plan to backup and restore keyspace from PROD to PRE-PROD cluster which has 
same number  of nodes. Keyspace will have few hundred millions of rows. We need 
to do this every other week. Which one of the below  options most 
time-efficient and puts less stress on target cluster ? We want to finish 
backup and restore in low usage time window.
Nodetool refresh
1.      Take a snapshot from individual nodes from prod
2.      Copy the sstable data and index files to pre-prod cluster (copy the 
snapshots to respective nodes based on token assignment)
3.      Cleanup old data and
4.      Run nodetool refresh on every node

Sstableloader
1.      Take a snapshot from individual nodes from prod
2.      Copy the sstable data and index files from all nodes to 1 node in  
pre-prod cluster
3.      Cleanup old data
4.      Then run sstableloader to load data to respective keyspace/ CF. (Does 
sstableloader work in cluster (without vnodes ) where authentication is enabled)

CQL3 COPY
I tried this for CF that have <1 million rows and it works fine . But for large 
CF it throws rpc_timeout error
Any other suggestions?

AB SVENSKA SPEL
621 80 Visby
Norra Hansegatan 17, Visby
Växel: +4610-120 00 00
https://svenskaspel.se

Please consider the environment before printing this email

Reply via email to