I will reply to myself and raise a flag in case someone is interested. Assuming the tokens are replicated as follows: """ Request to update row X Compute the token from the row key Identify the server with that token Place one replica there Increment the token until you get to a different server Place the next token there """
Then the number of virtual nodes may affect the availability. If we consider the previous example. Let’s call the tokens as Tx and the server as Sx where x is the number of the token or server respectively. With a RF = 3 this means that on S1: T1 replicated to S2 (owner of T2) and S4 (owner of T3) T5 replicated to S2 (owner of T6) and S4 (owner of T7) T13 replicated to S2 (owner of T14) and S3 (owner of T15) T9 replicated to S2 (owner of T10) and S3 (owner of T11) These are the possible data loss scenarios involving S1: I lose S1, S2 and S3 => I lost T13 and T9 I lose S1, S2 and S4 => I lost T1 and T5 ==> This is a lower availability with respect to a scenario in which each server has only one token. Consider this simple case <http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/file/n7593326/355a30c68fa0538957eca97ebb226cc3.png> Again with RF = 3 I will have: T1 is replicated in S2 and S3. The only data loss scenario involving S1 is the following: I lose S1, S2, and S3 => I lost token 1 which is 1 less then the previous case. the gap between the two increases with the number of virtual nodes. can anyone confirm this conjecture? thanks -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replication-with-virtual-nodes-tp7593310p7593326.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.