I do not know of any articles I could send your way, and others may have some tales from running production systems. But here are a few thoughts, others please correct me if I am wrong:
- the replication factor is not intended to the changed on a running system. It can be, but it will be a heavy weight process http://wiki.apache.org/cassandra/Operations#Replication - When adding nodes to a cluster it's mode efficient if you can change the range to existing nodes to be a sub set of what they were responsible for previously. So the node only has to stream out data, rather than stream out and stream in data. Say you have this contrived example (where values are real numbers between 1 and 10) : - node a -> values 1 to 3 - node b -> values 4 to 6 - node c -> values 7 to 10 And you want to add node d: - if you add node d to handle values between 2 and 3, you can stream node a's data over and then delete data it is no longer responsible for. - If you want a more balanced ring, you may want to change the all the ranges to be: - node a -> values 1 to 2.5 - node b -> values 2.5 to 5.0 - node c -> values 5.0 to 7.5 - node d -> values 7.5 to 10 In this case there are a lot of moves, for example node b has to both send data to node c and get data from node b. AFAIK the easier path when growing is to double the number of nodes. Cassandra does support more complicates moves but they may require a lot of resources. How this impacts your system depends on load, data size and IO capacity. Hope that helps. Aaron On 8/03/2011, at 9:28 AM, Paul Pak wrote: > Hello, > > I'm doing some testing of Cassandra and I've read a lot about people > running into situations growing their clusters. So, I'm about to test > it with .7.3. I've got a test node which is a single node with a > replication factor of 1. I'd like to grow it to 3 nodes and a > replication factor of 2. Then, I'd like to grow the cluster to 10 nodes > with a replication factor of 3. > > I'm trying to determine a few things. > a) What kind of load it will put on the cluster that is "in use" > b) How long will it take? > c) Are there any gotchas in the process? > d) How easy/difficult will this be? > > If anyone has any experience with growing clusters, please share. If > someone can give an idea of what I need to do for growing the clusters > properly, I'll be happy to do it and report back anything I find. > > Paul