It is not common, but I know of multiple organizations running with RF=5, in at least one DC, for HA reasons.
-Tupshin On Apr 15, 2014 2:36 PM, "Robert Coli" <rc...@eventbrite.com> wrote: > On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock <ken.hanc...@schange.com>wrote: > >> Keep in mind if you lose the wrong two, you can't satisfy quorum. In a >> 5-node cluster with RF=3, it would be impossible to lose 2 nodes without >> affecting quorum for at least some of your data. In a 6 node cluster, once >> you've lost one node, if you were to lose another, you only have a 1-in-5 >> chance of not affecting quorum for some of your data. >> > > This is why the real highly available way to run Cassandra with QUORUM is > RF=5, with 5 "racks". > > Briefly, any given node running a JVM based distributed application should > be assumed to potentially become transiently unavailable for a short time, > for example during long GC pauses or rolling restarts. There is also a > chance of non-transient failure (hard down) at any time, and a much smaller > chance of two simultaneous non-transient failures. If you have RF=3 and > lose two nodes (one transient, the other non-transient) in a range, that > range is now unavailable because quorum is 2 and 3-2 is 1, which is less > than 2. If you have RF=5 and lose two nodes in the same way, quorum is 3 > and 5-2 is 3, which is equal to 3. > > AFAICT, no one actually runs Cassandra in this way because keeping 5 > copies of your already denormalized data seems excessive and is difficult > to justify to management. > > =Rob > >