Hi Rob, thanks. How many nodes to you have running in those 5 racks and RF 5? Only 5 nodes or more?
Markus Robert Coli <rc...@eventbrite.com> schrieb am 20:36 Dienstag, 15.April 2014: On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock <ken.hanc...@schange.com> wrote: > >Keep in mind if you lose the wrong two, you can't satisfy quorum. In a 5-node >cluster with RF=3, it would be impossible to lose 2 nodes without affecting >quorum for at least some of your data. In a 6 node cluster, once you've lost >one node, if you were to lose another, you only have a 1-in-5 chance of not >affecting quorum for some of your data. >> > > >This is why the real highly available way to run Cassandra with QUORUM is >RF=5, with 5 "racks". > > >Briefly, any given node running a JVM based distributed application should be >assumed to potentially become transiently unavailable for a short time, for >example during long GC pauses or rolling restarts. There is also a chance of >non-transient failure (hard down) at any time, and a much smaller chance of >two simultaneous non-transient failures. If you have RF=3 and lose two nodes >(one transient, the other non-transient) in a range, that range is now >unavailable because quorum is 2 and 3-2 is 1, which is less than 2. If you >have RF=5 and lose two nodes in the same way, quorum is 3 and 5-2 is 3, which >is equal to 3. > > >AFAICT, no one actually runs Cassandra in this way because keeping 5 copies of >your already denormalized data seems excessive and is difficult to justify to >management. > > >=Rob > >