I do not know of any articles I could send your way, and others may have some 
tales from running production systems. But here are a few thoughts, others 
please correct me if I am wrong:

- the replication factor is not intended to the changed on a running system. It 
can be, but it will be a heavy weight process 
http://wiki.apache.org/cassandra/Operations#Replication

- When adding nodes to a cluster it's mode efficient if you can change the 
range to existing nodes to be a sub set of what they were responsible for 
previously. So the node only has to stream out data, rather than stream out and 
stream in data. Say you have this contrived example (where values are real 
numbers between 1 and 10) :
        - node a -> values 1 to 3
        - node b -> values 4 to 6
        - node c -> values 7 to 10

        And you want to add node d:
        - if you add node d to handle values between 2 and 3, you can stream 
node a's data over and then delete data it is no longer responsible for. 
        - If you want a more balanced ring, you may want to change the all the 
ranges to be:
                - node a -> values 1 to 2.5
                - node b -> values 2.5 to 5.0
                - node c -> values 5.0 to 7.5
                - node d -> values 7.5 to 10
        In this case there are a lot of moves, for example node b has to both 
send data to node c and get data from node b. 

        AFAIK the easier path when growing is to double the number of nodes. 
Cassandra does support more complicates moves but they may require a lot of 
resources. How this impacts your system depends on load, data size and IO 
capacity. 

Hope that helps. 
Aaron

On 8/03/2011, at 9:28 AM, Paul Pak wrote:

> Hello,
> 
> I'm doing some testing of Cassandra and I've read a lot about people
> running into situations growing their clusters.  So, I'm about to test
> it with .7.3.  I've got a test node which is a single node with a
> replication factor of 1.  I'd like to grow it to 3 nodes and a
> replication factor of 2.  Then, I'd like to grow the cluster to 10 nodes
> with a replication factor of 3.
> 
> I'm trying to determine a few things.
> a) What kind of load it will put on the cluster that is "in use"
> b) How long will it take?
> c) Are there any gotchas in the process?
> d) How easy/difficult will this be?
> 
> If anyone has any experience with growing clusters, please share.    If
> someone can give an idea of what I need to do for growing the clusters
> properly, I'll be happy to do it and report back anything I find.
> 
> Paul

Reply via email to