So can one just take all of the *.db files from all the machines in a cluster, put them in a folder together (renaming ones with the same number?) and start up a node which will then have access to all the data?
-----Original Message----- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, March 16, 2011 1:59 PM To: user@cassandra.apache.org Cc: Jedd Rashbrooke Subject: Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer That should work then, assuming SimpleStrategy/RackUnawareStrategy. Otherwise figuring out which machines share which data gets complicated. Note that if you have room on the machines, it's going to be faster to copy the entire data set to each machine and run cleanup, than to have repair fix 3 of 4 replicas from scratch. Repair would work, eventually, but it's kind of a worst-case scenario for it. On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke <j...@visualdna.com> wrote: > Jonathon, thank you for your answers here. > > To explain this bit ... > > On 11 March 2011 20:46, Jonathan Ellis <jbel...@gmail.com> wrote: >> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <j...@visualdna.com> wrote: >>> Copying a cluster between AWS DC's: >>> We have ~ 150-250GB per node, with a Replication Factor of 4. >>> I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to >>> minimise that outage period I was wondering if it's possible to >>> drain & stop the cluster, then copy over only the 1st, 5th, 9th, >>> and 13th nodes' worth of data (which should be a full copy of >>> all our actual data - we are nicely partitioned, despite the >>> disparity in GB per node) and have Cassandra re-populate the >>> new destination 16 nodes from those four data sets. If this is >>> feasible, is it likely to be more expensive (in terms of time the >>> new cluster is unresponsive as it rebuilds) than just copying >>> across all 16 sets of data - about 2.7TB. >> >> I'm confused. You're trying to upgrade and add a DC at the same time? > > Yeah, I know, it's probably not the sanest route - but the hardware > (virtualised, Amazonish EC2 that it is) will be the same between > the two sites, so that reduces some of the usual roll in / roll out > migration risk. > > But more importantly for us it would mean we'd have just the > one major outage, rather than two (relocation and 0.6 -> 0.7) > > cheers, > Jedd. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com