Hi Peter - Thanks for the response and adding the FAQ. Really great answers.
So if I understand correctly, the nodes getting the replica copies are predetermined, based on the replication strategy. One thing is still a bit unclear. So once a node establishes who its replicants are, are those "replicant nodes" always used? In other words, given a RF=3, if the other two replica nodes go down, the "original" node will not automatically pick 2 new nodes for its replica copies? Jon On Mar 26, 2011, at 12:05 AM, Peter Schuller wrote: >> Does anyone know how cassandra chooses the nodes for its other replicant >> copies? > > This keeps coming up so I added a FAQ entry: > > http://wiki.apache.org/cassandra/FAQ#replicaplacement > > I don't quite like the phrasing but I couldn't come up anything that > was sufficiently clear and complete right now. > >> The first node gets the first copy because its token is assigned for that >> key. But what about the other copies of the data? >> Do the replicant nodes stay the same based on the token range? Or are the >> other copies send to any random node based on its load and availability? >> I think this is important in order to understand because it affects how to >> plan for situations where a significant number of nodes are suddenly >> unavailable, such as the loss of a data center. > > I hope the above is answered by the FAQ. If it's unclear please say so > and we can clarify. > >> If the replicants are copied just based on random availability, then quorum >> writes could survive on the remaining nodes. But if the replicant nodes are >> somehow pre-determined, those replicants may node be available and writes >> will fail. > > I'm not really following this though. Why would you ever want data to > be placed based on "random availability"? > > If you are writing at QUORUM, a quorum of nodes in the replicate set > must have ack:ed the write in order for the read to be considered > successful (similar for reads). If a sufficient amount of nodes are > up, you're fine. If not, then no - fundamentally that would violate > the requirement of quorum. > > For example, if you're at RF=3, at least two nodes (in the replica set > for a given key) must be responding to your request in order for them > to succeed at QUORUM. > > -- > / Peter Schuller