That will not necessarily scale, and I wouldn't recommend it - your "backup node" will need as much disk space as an entire replica of the cluster data. For a cluster with a couple of nodes that may be OK, for dozens of nodes, probably not. You also lose the ability to restore individual nodes - the only way to replace a dead node is with a full repair.
On Thu, Jun 12, 2014 at 1:38 PM, Jabbar Azam <[email protected]> wrote: > There is another way. You create a cassandra node in it's own datacentre, > then any changes going to the main cluster will be replicated to this node. > You can backup from this node. In the event of a disaster the data from > both clusters and wiped and then replayed to the individual node. The data > will then be replicated to the main cluster. > > This will also work for the case when the main cluster increases or > decreases in size. > > Thanks > > Jabbar Azam > > > On 12 June 2014 18:27, Andrew <[email protected]> wrote: > >> There isn’t a lot of “actual documentation” on the act of backing up, but >> I did research for my own company into the act of backing up and >> unfortunately, you’re not going to have a similar setup as Oracle. There >> are reasons for this, however. >> >> If you have more than one replica of the data, that means each node in >> the cluster will likely be holding it’s own unique set of data. So you >> would need to back up the ENTIRE set of nodes in order to get an accurate >> snapshot. Likewise, you would need to restore it to the cluster of the >> same size in order to restore it (and then run refresh to tell Cassandra to >> reload the tables from disk). >> >> Copying the snapshots is easy—it’s just a bunch of files in your data >> directory. It’s even smaller if you use incremental snapshots. I’ll >> admit, I’m no expert on tape drives, but I’d imagine it’s as easy as >> copy/pasting the snapshots to the drive (or whatever the equivalent tape >> drive operation is). >> >> What you (and I, admittedly) would really like to see is a way to back up >> all the logical *data*, and then simply replay it. This is possible on >> Oracle because it’s typically restricted to either one (plus maybe one or >> two standbys) that don’t “share” any data. What you could do, in theory, >> is literally select all the data in the entire cluster and simply dump it >> to a file—but this could take hours, days, or even weeks to complete, >> depending on the size of your data, and then simply re-load it. This is >> probably not a great solution, but hey—maybe it will work for you. >> >> Netflix (thankfully) has posted a lot of their operational observations >> and what not, including their utility Priam. In their documentation, they >> include some overviews of what they use: >> https://github.com/Netflix/Priam/wiki/Backups >> >> Hope this helps! >> >> Andrew >> >> On June 12, 2014 at 6:18:57 AM, Jack Krupansky ([email protected]) >> wrote: >> >> The doc for backing up – and restoring – Cassandra is here: >> >> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html >> >> That doesn’t tell you how to move the “snapshot” to or from tape, but a >> snapshot is the starting point for backing up Cassandra. >> >> -- Jack Krupansky >> >> *From:* Camacho, Maria (NSN - FI/Espoo) <[email protected]> >> *Sent:* Thursday, June 12, 2014 4:57 AM >> *To:* [email protected] >> *Subject:* Backup Cassandra to >> >> >> Hi there, >> >> >> >> I'm trying to find information/instructions about backing up and >> restoring a Cassandra DB to and from a tape unit. >> >> >> >> I was hopping someone in this forum could help me with this since I >> could not find anything useful in Google :( >> >> >> >> Thanks in advance, >> >> Maria >> >> >> >> >
