Re: Replicating to all nodes

Peter Schuller Fri, 15 Jul 2011 06:54:20 -0700

> The goal is to configure a cluster in which reads and writes can
> complete successfully even if only 1 node is online. For this to work,


Why? You should be designing for "only 1 out of N nodes" where N is
RF. If you happen to have 3 machines now and you want 3 copies in
total that's fine. But why would you want RF=10 just because you add
10 nodes? It seems really off. You add nodes to increase total
capacity. The number of copies of each piece of data, for redundancy
purposes, is usually a completely separate concern from your cluster
size.

> each node would need the entire dataset. Your example of a 3 node ring
> with RF=3 would satisfy this requirement. However, if two nodes are
> offline, CL.QUORUM would not work, I would need to use CL.ONE. If all
> 3 nodes are online, CL.ONE is undershooting, I would want to use
> CL.QUORUM (or maybe CL.ALL). Or does CL.ONE actually function this
> way, somewhat?

Writes ALWAYS go to all machines eventually; usually immediately if
all nodes are up. Consistency Level ONLY affects what is *required* to
be successfully *acked* before a write (or a read) returns back to the
client. Using CL.ONE never means that data won't be replicated to all
nodes eventually.

The reason to use QUORUM is to get strong consistency in the sense
that a write followed by a subsequent read is guaranteed to see that
write.

Another reason is to guarantee that no write is lost if a single node
suddenly evaporates/explodes/kernel panics.

If you don't have strong consistency/durability demands, probably just
use CL.ONE. Data will still be replicated at whatever replication
factor (RF) you have chosen.

> A complication occurs when you want to add another node. Now there's a
> 4 node ring, but only 3 replicas, so each node isn't guaranteed to
> have all of the data, so the cluster can't completely function when
> N-1 nodes are offline. So this is why I would like to have the RF
> scale relative to the size of the cluster. Am I mistaken?

It seems like you have mistaken requirements. Even if for some strange
reason you really want to tie RF to number of nodes, you can just add
a node FIRST, and *then* increase RF. But be advised that increasing
RF implies cluster downtime (if you want to get correct data on reads)
because you have to run a rotating 'nodetool repair' after changing
the replication level.

But I repeat: You almost certainly don't want to be changing RF all
the time. You most likely just want to settle on one particular level
of redundancy, which is going to be the RF, and also implies the
*minimum* number of nodes in the cluster. Then you add more nodes for
*capacity* reasons, but not for redundancy reasons, and there's no
reason to increase RF.

If you *really* know what you're doing and why you want RF to track
total node count, I'm sure there are *some* cases where this makes
sense. But nothing you've said so far really indicates you're in such
a position.

-- 
/ Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

Reply via email to