> I've implemented this with MySQL before, and it worked extremely well > (miles beyond mysqldump or mysqlhotcopy). On a given node, you sacrifice a > short period of availability (less than 0.5 seconds) to get a full, > consistent snapshot of your EBS volume that can be sent off to S3 in the > background, after the filesystem has unlocked and disk activity has > resumed. Has anybody tried implementing this with a Cassandra cluster? > What are the issues you ran into?
Yes I have implemented this and it works as one would expect. I would recommend stopping one cassandra jvm at a time and fail fast so if anything happens to one node, the worst case is that you have to swap out that one node assuming you have replication set up but I haven't run into any issues. With the instance taken down freezing the filesystem is a precautionary measure in case it accidentally gets restarted, etc. > How did it compare with using Cassandra's "nodetool snapshot"? > They both work well but serve different needs. EC2 snapshotting your nodes makes sense if, for example, you run the entire cluster in the same availability zone and you want to be able to restore from backup if the data center has an outage. It's also handy for things like "cloning" your cluster since you can create a new cluster of the same size and loading the data off the ec2 snapshots. That being said, EC2 machines can be very unreliable in terms of network latency, etc. so being tightly coupled to EC2 can be risky. If you have a rack aware strategy with your cluster partitioned across different availability zones, nodetool snapshot may be all you need. > > I think I could do this on a running node with a 0.5 second timeout. The > XFS docs state "Any process attempting to write to the frozen filesystem > will block waiting for the filesystem to be unfrozen." Having writes > block on a node for <0.5s sounds like something the Cassandra would handle > fine. > If you just stop one instance at a time, it's a non-issue if you have say RF=3 and CL=quorum. Reads and writes will just be redirected to other up nodes while the node is down. > > The Cassandra docs state "You can get an eventually consistent backup by > flushing all nodes and snapshotting; no individual node's backup is > guaranteed to be consistent but if you restore from that snapshot then > clients will get eventually consistent behavior as usual." This lead me to > believe that as long as I have snapshot each node in the cluster within a > reasonable window (say 2 hours), I'd be able to bring the entire cluster > back with a guarantee that it is consistent up to the point where the > snapshot window began. > Right, one thing to keep in mind is depending on how much data you are storing on each box, the ec2 snapshot may not finish within 2 hours. Also if you are using cassandra snapshot, you don't want to just keep snapshotting without either removing or moving the data of previous snapshots off the disk since the disk will fill up quickly.