Thanks on the correction about Keyspace versus ColumnFamily ... I knew that just mis-typed.
I guess it should be stated (to be obvious) ... that when you are auto bootstrapping a node ... the seed better be alive. The scenario I'm dealing with is that it might not be (reasons for that are tangential). I am contemplating a situation where there may be 2N servers ... but only N online at any one time. But, for operational purposes, N+n (where n is 1 or 2), N may be occasionally greater than R. This gets to the hinted hand-off question ... if R=8 and N=8 ... and all was fine for awhile ... and then N8(S8) (node 8, server 8) goes down ... N8(S9) replaces it ... N8(S9) will take the hit to obtain all that it never had before. Then ... at some subsequent time, N9(S8) comes back to life ... will it take over its former role and the R is now 9 even though the storage-conf had set it to 8 for a particular keyspace? I'm asking these questions ... because they've been asked of me. I've been working with Cassandra for 3+ months now and this level of the key management is something that I struggle to get my head around. What do you mean by "token automatic assignment may not do what you want"? If I specify R=N ... then what I want is all data to replicated to all nodes. What does a power of 2 have to do with this? Are there undocumented recommendations about cluster size to ensure that you can survive any one (or two) nodes failing? Thanks in advance. -phil On Jun 4, 2010, at 1:46 PM, Benjamin Black wrote: > On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope <pstanh...@wimba.com> wrote: >> >> Here's the scenario: would like R = N where N is the number of nodes. Let's >> say 8. >> >> 1. Create first node, modify storage-conf.xml and change the <Seed/> to be >> the ip of the node. Change replication factor to 8 for CF of interest. Start >> the puppy up. >> > > RF is per Keyspace, not per CF. > >> 2. Create 2nd node, modify storage-confg.xml and change <AutoBootstrap/> to >> true and let it know the first seed. Ensure replication factor is 8 for the >> CF of interest. Start the puppy up. >> > > If you do it this way be aware token automatic assignment may not do > what you want. It _probably_ will, since 8 is a power of 2, but be > aware. > >> 3. Create 3rd node. >> >> Q1: Should the node1 and node2 be listed as seeds? Or only node1? >> > > Doesn't matter. Seeds are only used as a discovery mechanism. One is > sufficient. > >> 4. Create 4th node. Same question as before. >> >> Q2: Same question as before ... should the list of seeds grow as the cluster >> grows? Alternative phrasing ... what is the relationship between Seed and >> AutoBootstrap, i.e. can a Seed node in fact be a node that was >> autobootstrapped? Is this considered best practice? >> > > Once a node is bootstrapped, auto or otherwise, that's it. It is now > just another node in the cluster. How it got that way is not > relevant. > >> At this point we've got 4 nodes in the cluster ... I've gotten this far with >> no problems, loaded with tons of data and compared performance with various >> replication factors. Seeing faster reads from any particular node (as >> expected) when the ReplicationFactor is equal to the number of nodes in the >> cluster. Have compared lots of single update/creates as well as batch_mutate >> (which is very fast for bootstrapping the CFs -- highly recommended). >> >> And also seeing varying performance on reads (fast, and as expected) when >> ReplicationFactor < N. >> >> Q3: What, if any issue, is there when R > N? >> > > Not recommended. > >> This is the situation as you're bringing up nodes in the cluster. And when >> you take down a node (intentionally or as a failure). >> >> I know one consideration is that if R >= N ... and CF data grows ever bigger >> ... there will be a hit as the node is created. >> >> Q4: If you know that you're never going to have more than 40 >> (MaxExpectedClusterNodes) in your cluster ... is it safe to set R >= >> MaxExpectedClusterNodes? >> > > Setting it higher is not going to help you. It is also unclear to me > how having a cluster that large with an RF that high is going to > behave. Read repair (which happens on every call) is going to be > _brutal_. > >> Q5: If you set R = MaxExpectedClusterNodes ... and you end up servicing a >> node .... and bringing up an alternate node in its place ... thus having R = >> N at all times ... and then you bring up the N+1 node ... will it start to >> receive the data that it missed while it was down? >> > > This is the Hinted Handoff mechanism. > > > b