Hi Norman,

In short, any time you add a node to a cluster there will be a
redistribution of data and it will be proportional to the total number of
nodes you have in the cluster. VNodes just create smaller chunks and
distribute them around the cluster more. If you have a 3 node cluster with
a RF=1(for simplicity's sake) and add 1 node, every existing node has to
reduce its responsibility from 1/3 of the cluster data to 1/4. The new node
will need to accept 1/4 of the total cluster data as a part of joining.
That's the basics but you can extrapolate from there.

I would be happy to get on zoom and talk it over. Here's my scheduling
link: https://calendly.com/patrick-mcfadin/30min_zoom

Patrick

On Wed, Jun 29, 2022 at 5:13 AM Norman Menfel <norman34...@gmail.com> wrote:

> Hi all,
>
> apologies for writing to this mailing list but I tried the user mailing
> list, 2 slack channels, reddit and 3 discord channels and got horrible and
> confused answers.
>
> I'm working on a school project trying to reproduce the tokens
> distribution algorithm described in the dynamo db paper. All I want to
> build is a cluster where nodes can join/leave managing vnodes distribution
> just like in Cassandra (I don't care about r/w, replication,....)
>
> I believe I understand how everything works without vnodes. But everything
> stops making sense when introducing vnodes. For example, when a new node
> joins a cluster
> new vnodes need to be created. Why adding vnodes does not create a massive
> redistribution of data in the cluster? afterall, adding vnodes means that
> every vnode in the cluster has to "give up" some data to other vnodes in
> order to keep a balanced load across the cluster.
>
> From the documentation it seems like only the portion of the ring
> associated with the node should soffer this redistribution but why does a
> node have a portion of the partition ring associated with it when the
> vnodes stored on the node may be from any portion of the ring?
>
> As you can see, I'm quite confused! I understand that to give me a full
> answer may take you too much time but if you could just point me in the
> right direction, tell me where should I look in the source code, or share
> some links (I've already read anything on the apache
> website/datastax.....Ive even read riak documentation trying to find clues)
> that would be amazing!
>
> Thanks a lot for your time and keep up the great work, I love Cassandra!
> Norman
>

Reply via email to