Right. Only subtlety is the system keyspace; cleanest is to just start from scratch there (which means rebuilding the schema) but you could also start with a copy of an existing node's (just one) and start up with -Dcassandra.load_ring_state=false.
On Fri, Mar 18, 2011 at 2:29 PM, Jeremiah Jordan <jeremiah.jor...@morningstar.com> wrote: > So can one just take all of the *.db files from all the machines in a > cluster, put them in a folder together (renaming ones with the same number?) > and start up a node which will then have access to all the data? > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Wednesday, March 16, 2011 1:59 PM > To: user@cassandra.apache.org > Cc: Jedd Rashbrooke > Subject: Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising > data transfer > > That should work then, assuming SimpleStrategy/RackUnawareStrategy. > Otherwise figuring out which machines share which data gets > complicated. > > Note that if you have room on the machines, it's going to be faster to > copy the entire data set to each machine and run cleanup, than to have > repair fix 3 of 4 replicas from scratch. Repair would work, > eventually, but it's kind of a worst-case scenario for it. > > On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke <j...@visualdna.com> wrote: >> Jonathon, thank you for your answers here. >> >> To explain this bit ... >> >> On 11 March 2011 20:46, Jonathan Ellis <jbel...@gmail.com> wrote: >>> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <j...@visualdna.com> wrote: >>>> Copying a cluster between AWS DC's: >>>> We have ~ 150-250GB per node, with a Replication Factor of 4. >>>> I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to >>>> minimise that outage period I was wondering if it's possible to >>>> drain & stop the cluster, then copy over only the 1st, 5th, 9th, >>>> and 13th nodes' worth of data (which should be a full copy of >>>> all our actual data - we are nicely partitioned, despite the >>>> disparity in GB per node) and have Cassandra re-populate the >>>> new destination 16 nodes from those four data sets. If this is >>>> feasible, is it likely to be more expensive (in terms of time the >>>> new cluster is unresponsive as it rebuilds) than just copying >>>> across all 16 sets of data - about 2.7TB. >>> >>> I'm confused. You're trying to upgrade and add a DC at the same time? >> >> Yeah, I know, it's probably not the sanest route - but the hardware >> (virtualised, Amazonish EC2 that it is) will be the same between >> the two sites, so that reduces some of the usual roll in / roll out >> migration risk. >> >> But more importantly for us it would mean we'd have just the >> one major outage, rather than two (relocation and 0.6 -> 0.7) >> >> cheers, >> Jedd. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com