tl;dr make sure you have enough capacity in the event of node failure. For
light workloads, that can be fulfilled with nodes=rf.

-Tupshin
On Apr 14, 2014 2:35 PM, "Robert Coli" <rc...@eventbrite.com> wrote:

> On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais <markus.j...@yahoo.de> wrote:
>
>> "It is generally not recommended to set a replication factor of 3 if you
>> have fewer than six nodes in a data center".
>>
>
> I have a detailed post about this somewhere in the archives of this list
> (which I can't seem to find right now..) but briefly, the "6-for-3" advice
> relates to the percentage of capacity you have remaining when you have a
> node down. It has become slightly less accurate over time because vnodes
> reduce bootstrap time and there have been other improvements to node
> startup time.
>
> If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when
> you lose a single node, which is a significant percentage of total cluster
> capacity. You then lose another meaningful percentage of your capacity when
> your existing nodes participate in rebuilding the missing node. If you are
> then unlucky enough to lose another node, you are missing a very
> significant percentage of your cluster capacity and have to use a
> relatively small fraction of it to rebuild the now two down nodes.
>
> I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but
> rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view,
> the most operationally sane initial number of nodes for RF=3 is likely
> closer to 6 than 3.
>
> =Rob
>
>

Reply via email to