On Tue, Oct 14, 2014 at 4:52 PM, Donald Smith <
donald.sm...@audiencescience.com> wrote:

>  Suppose I create a new DC with 25 nodes. I have their IPs in
> cassandra-topology.properties.  Twenty-three of the nodes start up, but two
> of the nodes fail to start.   If I start replicating (via "nodetool
> rebuild") without those two nodes, then when those 2 nodes enter the DC the
> distribution of tokens to vnodes will change and I'd need to rebuild or
> bootstrap, right?
>
>
>
> In other words, it's better to wait til all nodes come up before we start
> replicating.  Does this sound right?
>
>
>
> I presume that all the nodes need to come up so it can learn the token
> ranges.
>

I don't understand your question. Vnodes exist to randomly distribute data
on each physical node into [n] virtual node chunks, 256 by default.

They do this in order to allow you to add 2 nodes to your 25 node cluster
without rebalancing the prior 23.

The simplest way to illustrate this is to imagine a token range of 0-20 in
a 4 node cluster with RF=1.

A 0-5
B 5-10
C 10-15
D 15-20 (0)

Each node has 25% of the data. If you add a new node "E", and want it to
join with 25% of the data, there is literally nowhere you can have it join
to accomplish this goal. You have to join it in between one of the existing
nodes, and then move each of those nodes so that the distribution is even
again. This is why, prior to vnodes, the best practice was to double your
cluster size.

=Rob
http://twitter.com/rcolidba

Reply via email to