RE: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Jeremiah Jordan Fri, 18 Mar 2011 12:30:16 -0700

So can one just take all of the *.db files from all the machines in a cluster, 
put them in a folder together (renaming ones with the same number?) and start 
up a node which will then have access to all the data?


-----Original Message-----
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Wednesday, March 16, 2011 1:59 PM
To: user@cassandra.apache.org
Cc: Jedd Rashbrooke
Subject: Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data 
transfer

That should work then, assuming SimpleStrategy/RackUnawareStrategy.
Otherwise figuring out which machines share which data gets
complicated.

Note that if you have room on the machines, it's going to be faster to
copy the entire data set to each machine and run cleanup, than to have
repair fix 3 of 4 replicas from scratch.  Repair would work,
eventually, but it's kind of a worst-case scenario for it.

On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke <j...@visualdna.com> wrote:
>  Jonathon, thank you for your answers here.
>
>  To explain this bit ...
>
> On 11 March 2011 20:46, Jonathan Ellis <jbel...@gmail.com> wrote:
>> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <j...@visualdna.com> wrote:
>>>  Copying a cluster between AWS DC's:
>>>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>>>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>>>  minimise that outage period I was wondering if it's possible to
>>>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>>>  and 13th nodes' worth of data (which should be a full copy of
>>>  all our actual data - we are nicely partitioned, despite the
>>>  disparity in GB per node) and have Cassandra re-populate the
>>>  new destination 16 nodes from those four data sets.  If this is
>>>  feasible, is it likely to be more expensive (in terms of time the
>>>  new cluster is unresponsive as it rebuilds) than just copying
>>>  across all 16 sets of data - about 2.7TB.
>>
>> I'm confused.  You're trying to upgrade and add a DC at the same time?
>
>  Yeah, I know, it's probably not the sanest route - but the hardware
>  (virtualised, Amazonish EC2 that it is) will be the same between
>  the two sites, so that reduces some of the usual roll in / roll out
>  migration risk.
>
>  But more importantly for us it would mean we'd have just the
>  one major outage, rather than two (relocation and 0.6 -> 0.7)
>
>  cheers,
>  Jedd.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

RE: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Reply via email to