Re: Adding a second datacenter

David Koblas Mon, 05 Mar 2012 08:35:11 -0800

Jeremiah,

Thanks!


I'm running 1.0.8, two interesting things to note:

- I don't have sufficient disk space to handle the straight bump to areplication factor of 4, so I think I'm going to have to do it one byone (1,2,3 and 4) with a bunch of cleanups in between.

- Also, using a LOCAL_QUORUM doesn't work since my application has ahard response time limit then my read speed ends up being the speed ofthe slowest node. What I want is LOCAL_ONE which doesn't exist in theAPI (unless I missed something).


Yes, CASSANDRA-3483 is really what I'm looking for.

--david

On 3/5/12 8:02 AM, Jeremiah Jordan wrote:

You need to make sure your clients are reading using LOCAL_* settingsso that they don't try to get data from the other data center. Butyou shouldn't get errors while replication_factor is 0. Once youchange the replication factor to 4, you should get missing data if youare using LOCAL_* for reading.
What version are you using?
See the IRC logs at the begining of this JIRA discussion thread forsome info:
https://issues.apache.org/jira/browse/CASSANDRA-3483

But you should be able to:
1. Set dc2:0 in the replication_factor.
2. Set bootstrap to false on the new nodes.
2. Start all of the new nodes.
3. Change replication_factor to dc2:4
4. run repair on the nodes in dc2.
Once the repairs finish you should be able to start using DC2. Youare still going to need a bunch of extra space because the repair isgoing to get you a couple copies of the data.
Once 1.1 comes out it will have new nodetool commands for making thisa little nicer per CASSANDRA-3483
-Jeremiah


On 03/05/2012 09:42 AM, David Koblas wrote:
Everything that I've read about data centers focuses on settingthings up at the beginning of time.
I've the the following situation:

10 machines in a datacenter (DC1), with replication factor of 2.
I want to set up a second data center (DC2) with the followingconfiguration:
  20 machines with a replication factor of 4
What I've found is that if I initially start adding things, the firstmachine to join the network attempts to replicate all of the datafrom DC1 and fills up it's disk drive. I've played with setting thestorage_options to have a replication factor of 0, then I can bringup all 20 machines in DC2 but then start getting a huge number ofread errors from read on DC1.
Is there a simple cookbook on how to add a second DC? I'm currentlytrying to set the replication factor to 1 and do a repair, but thatdoesn't feel like the right approach.
Thanks,

Re: Adding a second datacenter

Reply via email to