Hi David, I'm not running Cassandra 2.0.2, but I'm used to move the data from a Cassandra cluster with vnodes to another one.
I will do the same for backuping the cluster A. In order to restore cluster B, I do the following steps: 1. Deploy 5 nodes as part of the cluster-B ring. 2. Create keyspace_name keyspace and column families on cluster-B. 3 Copy backup of each node in one node to: /tmp/node1/Keyspace_name/cf_name/ /tmp/node2/Keyspace_name/cf_name/ /tmp/node3/Keyspace_name/cf_name/ /tmp/node4/Keyspace_name/cf_name/ /tmp/node5/Keyspace_name/cf_name/ 4 Use sstableloader to load the sstable of each repository. Sstableloader guarantees putting the data on a good node. 5 Make a repair on each node. Sstableloader is the right tool to make this kind of operation. Good luck :) Julien Campan. 2013/11/19 Aaron Morton <aa...@thelastpickle.com> > we then take the snapshot archive generated FROM cluster-A_node1 and > copy/extract/restore TO cluster-B_node1, then we > > sounds correct. > > Depending on what additional comments/recommendation you or another member > of the list may have (if any) based on the clarification I've made above, > > > Also if you backup the system data it will bring along the tokens. This > can be a pain if you want to change the cluster name. > > cheers > > ----------------- > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 15/11/2013, at 10:44 am, David Laube <d...@stormpath.com> wrote: > > Thank you for the detailed reply Rob! I have replied to your comments > in-line below; > > On Nov 14, 2013, at 1:15 PM, Robert Coli <rc...@eventbrite.com> wrote: > > On Thu, Nov 14, 2013 at 12:37 PM, David Laube <d...@stormpath.com> wrote: > >> It is almost as if the data only exists on some of the nodes, or perhaps >> the token ranges are dramatically different --again, we are using vnodes so >> I am not exactly sure how this plays into the equation. > > > The token ranges are dramatically different, due to vnode random token > selection from not setting initial_token, and setting num_tokens. > > You can verify this by listing the tokens per physical node in nodetool > gossipinfo or (iirc) nodetool status. > > >> 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five >> nodes in the new cluster-B ring. >> > > I don't understand this at all, do you mean that you are using one source > node's data to load each of of the target nodes? Or are you just saying > there's a 1:1 relationship between source snapshots and target nodes to > load into? Unless you have RF=N, using one source for 5 target nodes won't > work. > > > We have configured RF=3 for the keyspace in question. Also, from a client > perspective, we read with CL=1 and write with CL=QUORUM. Since we have 5 > nodes total in cluster-A, we snapshot keyspace_name on each of the five > nodes which results in a snapshot directory on each of the five nodes that > we archive and ship off to s3. We then take the snapshot archive generated > FROM cluster-A_node1 and copy/extract/restore TO cluster-B_node1, then > we take the snapshot archive FROM cluster-A_node2 and copy/extract/restore > TO cluster-B_node2 and so on and so forth. > > > To do what I think you're attempting to do, you have basically two options. > > 1) don't use vnodes and do a 1:1 copy of snapshots > 2) use vnodes and > a) get a list of tokens per node from the source cluster > b) put a comma delimited list of these in initial_token in > cassandra.yaml on target nodes > c) probably have to un-set num_tokens (this part is unclear to me, you > will have to test..) > d) set auto_bootstrap:false in cassandra.yaml > e) start target nodes, they will not-bootstrap into the same ranges as > the source cluster > f) load schema / copy data into datadir (being careful of > https://issues.apache.org/jira/browse/CASSANDRA-6245) > g) restart node or use nodetool refresh (I'd probably restart the node > to avoid the bulk rename that refresh does) to pick up sstables > h) remove auto_bootstrap:false from cassandra.yaml > > I *believe* this *should* work, but have never tried it as I do not > currently run with vnodes. It should work because it basically makes > implicit vnode tokens explicit in the conf file. If it *does* work, I'd > greatly appreciate you sharing details of your experience with the list. > > > I'll start with parsing out the token ranges that our vnode config ends up > assigning in cluster-A, and doing some creative config work on the target > cluster-B we are trying to restore to as you have suggested. Depending on > what additional comments/recommendation you or another member of the list > may have (if any) based on the clarification I've made above, I will > absolutely report back my findings here. > > > > General reference on tasks of this nature (does not consider vnodes, but > treat vnodes as "just a lot of physical nodes" and it is mostly relevant) : > http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra > > =Rob > > >