You don't need sstable loader if your topology hasn't changed and you have all your sstables backed up for each node. SStableloader actually streams data to all the nodes in a ring (this is what OpsCenter backup restore does). So you can actually restore to a larger or smaller cluster or a cluster with different token ranges / vnodes vs. non vnodes etc. It also requires all your nodes to be up.
If you have all the sstables for each node and no token range changes, you can just move the sstables to their spot in the data directory (rsync or w/e) and bring up your nodes. If you're already up you can use nodetool refresh to load the sstables. http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsRefresh.html All the best, [image: datastax_logo.png] <http://www.datastax.com/> Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] <https://twitter.com/datastax> [image: g+.png] <https://plus.google.com/+Datastax/about> <http://feeds.feedburner.com/datastax> <http://cassandrasummit-datastax.com/> DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Thu, Jun 4, 2015 at 5:39 AM, ZeroUno <zerozerouno...@gmail.com> wrote: > Hi, > while defining backup and restore procedures for a Cassandra cluster I'm > trying to use sstableloader for restoring a snapshot from a backup, but I'm > not sure I fully understand the documentation on how it should be used. > > Looking at the examples in the doc at > http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html > it seems like the path_to_keyspace to be passed as an argument is exactly > the cassandra data directory. So, you already move the data in the final > target location and then again stream it to the cluster? > > Let's do a step back. My cluster is composed of two data centers. Each > data center has two nodes (nodeA1, nodeA2 for center A, nodeB1, nodeB2 for > center B). > I'm using NetworkTopologyStrategy with RF=2. > > For doing periodic backups I'm creating a snapshot on two nodes > simultaneously in a single data center (nodeA1 and nodeA2), and then moving > the snapshot files in a safe place. > To simulate a disaster recovery situation, I truncate all tables to erase > data (but not the schema which would be re-created anyway by my > application), I stop cassandra on all 4 nodes, I move the snapshot backup > files in their original locations (e.g. > /mydatapath/cassandra/data/mykeyspace/mytable1/) on nodeA1 and nodeA2, then > I restart cassandra on all 4 nodes. > > At last, I run: > > sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2 >> /mydatapath/cassandra/data/mykeyspace/mytable1/ >> sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2 >> /mydatapath/cassandra/data/mykeyspace/mytable2/ >> sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2 >> /mydatapath/cassandra/data/mykeyspace/mytable3/ >> [...and so on for all tables] >> > > ...on both nodeA1 and nodeA2, where I restored the snapshot. > > Is that correct? > > I observed some strange behaviour after doing this: when I truncated > tables again, a select count(*) on one of the A nodes still returned a > non-zero number, as if data was still there. > I started thinking that maybe the source sstable directory for > sstableloader should not be the data directory itself, as this causes some > kind if "double data" problem... > > Can anyone please tell me if this is the correct way to proceed? > Thank you very much! > > -- > 01 > >