I was trying to get hold of all the data kind of a global snapshot. I did the below :
I copied all the snapshots from each individual nodes where the snapshot data size was around 12Gb on each node to a common folder(one folder alone). Strangely I found duplicate file names in multiple snapshots and more strangely the data size was different of each duplicate file which lead to the total data size to close to 13Gb(else have to be overwritten) where as the expectation was 12*6 = 72Gb. Does that mean that if I need to create a new ring with the same data as the existing one I cant just do that or should I start with the 13Gb copy to check if all the data is present which sounds pretty illogical. Please suggest?? ________________________________ From: Shubham Srivastava Sent: Thursday, April 26, 2012 12:43 PM To: 'user@cassandra.apache.org' Subject: Re: Taking a Cluster Wide Snapshot Your second part was what I was also referring where I put all the files from nodes to a single node to create a similar bkp which needs to have unique file names across cluster. From: Deno Vichas [mailto:d...@syncopated.net] Sent: Thursday, April 26, 2012 12:29 PM To: user@cassandra.apache.org <user@cassandra.apache.org> Subject: Re: Taking a Cluster Wide Snapshot there's no prerequisite for unique names. each node's snapshot gets tar'ed up and then copied over to a directory the name of the hostname of the node. then those dirs are tar'ed and copied to S3. what i haven't tried yet is to untar everything for all nodes into a single node cluster. i'm assuming i can get tar to replace or skip existing file so i end up with a set of unique files. can somebody confirm this? On 4/25/2012 11:45 PM, Shubham Srivastava wrote: Thanks a Lot Deno. A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release. BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have. Regards, Shubham ________________________________ From: Deno Vichas [d...@syncopated.net<mailto:d...@syncopated.net>] Sent: Thursday, April 26, 2012 12:09 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Taking a Cluster Wide Snapshot On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job. you'll need pssh and s3cmd - #!/bin/bash cd /home/ec2-user/ops echo "making snapshots" pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch' pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch' echo "making tar balls" pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz' pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots' echo "coping tar balls" pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz . echo "tar'ing tar balls" tar -cvf cassandra-snapshots-all-nodes.tar 10* echo "pushing to S3" ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar s3://stocktouch-backups echo "DONE!"