RE: Taking a Cluster Wide Snapshot

Shubham Srivastava Thu, 26 Apr 2012 06:55:11 -0700

I was trying to get hold of all the data kind of a global snapshot.

I did the below :


I copied all the snapshots from each individual nodes where the snapshot data 
size was around 12Gb on each node to a common folder(one folder alone).

Strangely I found duplicate file names in multiple snapshots and more strangely 
the data size was different of each duplicate file which lead to the total data 
size to close to 13Gb(else have to be overwritten) where as the expectation was 
12*6 = 72Gb.

Does that mean that if I need to create a new ring with the same data as the 
existing one I cant just do that or should I start with the 13Gb copy to check 
if all the data is present which sounds pretty illogical.

Please suggest??

________________________________
From: Shubham Srivastava
Sent: Thursday, April 26, 2012 12:43 PM
To: 'user@cassandra.apache.org'
Subject: Re: Taking a Cluster Wide Snapshot

Your second part was what I was also referring where I put all the files from 
nodes to a single node to create a similar bkp which needs to have unique file 
names across cluster.


From: Deno Vichas [mailto:d...@syncopated.net]
Sent: Thursday, April 26, 2012 12:29 PM
To: user@cassandra.apache.org <user@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

there's no prerequisite for unique names.  each node's snapshot gets tar'ed up 
and then copied over to a directory the name of the hostname of the node.  then 
those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single 
node cluster.  i'm assuming i can get tar to replace or skip existing file so i 
end up with a set of unique files.  can somebody confirm this?




On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command should be 
there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index 
or filters etc will have unique names across cluster. Is this a reasoanble 
assumption to have.

Regards,
Shubham
________________________________
From: Deno Vichas [d...@syncopated.net<mailto:d...@syncopated.net>]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. 
Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron 
job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
`hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
`hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu 
/home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
s3://stocktouch-backups

echo "DONE!"

RE: Taking a Cluster Wide Snapshot

Reply via email to