Re: Read inconsistency after backup and restore to different cluster

Julien Campan Wed, 20 Nov 2013 04:46:36 -0800

Hi David,

I'm not running Cassandra 2.0.2, but I'm used to move the data from a
Cassandra cluster with vnodes to another one.


I will do the same for backuping the cluster A.

In order to restore cluster B, I do the following steps:


1. Deploy 5 nodes as part of the cluster-B ring.
2. Create keyspace_name keyspace and column families on cluster-B.
3 Copy backup of each node in one node to:
        /tmp/node1/Keyspace_name/cf_name/
        /tmp/node2/Keyspace_name/cf_name/
       /tmp/node3/Keyspace_name/cf_name/
       /tmp/node4/Keyspace_name/cf_name/
       /tmp/node5/Keyspace_name/cf_name/
4 Use sstableloader to load the sstable of each repository. Sstableloader
guarantees putting the data on a good node.
5 Make a repair on each node.


Sstableloader is the right tool to make this kind of operation.


Good luck  :)


Julien Campan.


2013/11/19 Aaron Morton <aa...@thelastpickle.com>

> we then take the snapshot archive generated FROM cluster-A_node1 and
> copy/extract/restore TO cluster-B_node1,  then we
>
> sounds correct.
>
> Depending on what additional comments/recommendation you or another member
> of the list may have (if any) based on the clarification I've made above,
>
>
> Also if you backup the system data it will bring along the tokens. This
> can be a pain if you want to change the cluster name.
>
> cheers
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 15/11/2013, at 10:44 am, David Laube <d...@stormpath.com> wrote:
>
> Thank you for the detailed reply Rob!  I have replied to your comments
> in-line below;
>
> On Nov 14, 2013, at 1:15 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
> On Thu, Nov 14, 2013 at 12:37 PM, David Laube <d...@stormpath.com> wrote:
>
>> It is almost as if the data only exists on some of the nodes, or perhaps
>> the token ranges are dramatically different --again, we are using vnodes so
>> I am not exactly sure how this plays into the equation.
>
>
> The token ranges are dramatically different, due to vnode random token
> selection from not setting initial_token, and setting num_tokens.
>
> You can verify this by listing the tokens per physical node in nodetool
> gossipinfo or (iirc) nodetool status.
>
>
>> 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five
>> nodes in the new cluster-B ring.
>>
>
> I don't understand this at all, do you mean that you are using one source
> node's data to load each of of the target nodes? Or are you just saying
> there's a 1:1 relationship between source snapshots and target nodes to
> load into? Unless you have RF=N, using one source for 5 target nodes won't
> work.
>
>
> We have configured RF=3 for the keyspace in question. Also, from a client
> perspective, we read with CL=1 and write with CL=QUORUM. Since we have 5
> nodes total in cluster-A, we snapshot keyspace_name on each of the five
> nodes which results in a snapshot directory on each of the five nodes that
> we archive and ship off to s3. We then take the snapshot archive generated
> FROM cluster-A_node1 and copy/extract/restore TO cluster-B_node1,  then
> we take the snapshot archive FROM cluster-A_node2 and copy/extract/restore
> TO cluster-B_node2 and so on and so forth.
>
>
> To do what I think you're attempting to do, you have basically two options.
>
> 1) don't use vnodes and do a 1:1 copy of snapshots
> 2) use vnodes and
>    a) get a list of tokens per node from the source cluster
>    b) put a comma delimited list of these in initial_token in
> cassandra.yaml on target nodes
>    c) probably have to un-set num_tokens (this part is unclear to me, you
> will have to test..)
>    d) set auto_bootstrap:false in cassandra.yaml
>    e) start target nodes, they will not-bootstrap into the same ranges as
> the source cluster
>    f) load schema / copy data into datadir (being careful of
> https://issues.apache.org/jira/browse/CASSANDRA-6245)
>    g) restart node or use nodetool refresh (I'd probably restart the node
> to avoid the bulk rename that refresh does) to pick up sstables
>    h) remove auto_bootstrap:false from cassandra.yaml
>
> I *believe* this *should* work, but have never tried it as I do not
> currently run with vnodes. It should work because it basically makes
> implicit vnode tokens explicit in the conf file. If it *does* work, I'd
> greatly appreciate you sharing details of your experience with the list.
>
>
> I'll start with parsing out the token ranges that our vnode config ends up
> assigning in cluster-A, and doing some creative config work on the target
> cluster-B we are trying to restore to as you have suggested. Depending on
> what additional comments/recommendation you or another member of the list
> may have (if any) based on the clarification I've made above, I will
> absolutely report back my findings here.
>
>
>
> General reference on tasks of this nature (does not consider vnodes, but
> treat vnodes as "just a lot of physical nodes" and it is mostly relevant) :
> http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
>
> =Rob
>
>
>

Re: Read inconsistency after backup and restore to different cluster

Reply via email to