Re: Replication Factor question

Tupshin Harper Tue, 15 Apr 2014 11:40:07 -0700

It is not common,  but I know of multiple organizations running with RF=5,
in at least one DC, for HA reasons.


-Tupshin
On Apr 15, 2014 2:36 PM, "Robert Coli" <rc...@eventbrite.com> wrote:

> On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock <ken.hanc...@schange.com>wrote:
>
>> Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
>> 5-node cluster with RF=3, it would be impossible to lose 2 nodes without
>> affecting quorum for at least some of your data. In a 6 node cluster, once
>> you've lost one node, if you were to lose another, you only have a 1-in-5
>> chance of not affecting quorum for some of your data.
>>
>
> This is why the real highly available way to run Cassandra with QUORUM is
> RF=5, with 5 "racks".
>
> Briefly, any given node running a JVM based distributed application should
> be assumed to potentially become transiently unavailable for a short time,
> for example during long GC pauses or rolling restarts. There is also a
> chance of non-transient failure (hard down) at any time, and a much smaller
> chance of two simultaneous non-transient failures. If you have RF=3 and
> lose two nodes (one transient, the other non-transient) in a range, that
> range is now unavailable because quorum is 2 and 3-2 is 1, which is less
> than 2. If you have RF=5 and lose two nodes in the same way, quorum is 3
> and 5-2 is 3, which is equal to 3.
>
> AFAICT, no one actually runs Cassandra in this way because keeping 5
> copies of your already denormalized data seems excessive and is difficult
> to justify to management.
>
> =Rob
>
>

Re: Replication Factor question

Reply via email to