Thanks Jeff.
I think what you explained below is before and after vnodes introduction.
The vnodes part is clear - how each node holds a small range of tokens and
how each node holds a discontiguous set of vnodes.

   1. What is not clear is how each node decided what vnodes it will get.
   If it were contiguous, it would have been easy to understand (like token
   range).
   2. Also the original question of this thread: If each node does not
   replicate all its vnodes to the same 2 nodes (assume RF=2), then how does
   it decide where each of its vnode will be replicated to?

Maybe the answer to #2 is apparent in #1 answer.
But I would really appreciate if someone can help me understand the above.



On Mon, Nov 8, 2021 at 2:00 PM Jeff Jirsa <jji...@gmail.com> wrote:

> Vnodes are implemented by giving a single process multiple tokens.
>
> Tokens ultimately determine which data lives on which node. When you hash
> a partition key, it gives you a token (let's say 570). The 3 processes that
> own token 57 are the next 3 tokens in the ring ABOVE 570, so if you had
> A = 0
> B = 1000
> C = 2000
> D = 3000
> E = 4000
>
> The replicas for data for token=570 are B,C,D
>
>
> When you have vnodes and there's lots of tokens (from the same small set
> of 5 hosts), it'd look closer to:
> A = 0
> C = 100
> A = 300
> B = 700
> D = 800
> B = 1000
> D = 1300
> C = 1700
> B = 1800
> C = 2000
> E = 2100
> B = 2400
> A = 2900
> D = 3000
> E = 4000
>
> In this case, the replicas for token=570 are B, D and C (it would go B, D,
> B, D, but we would de-duplicate the B and D and look for the next
> non-B/non-D host.= D at 1700)
>
> If you want to see a view of this in your own cluster, use `nodetool ring`
> to see the full token ring.
>
> There's no desire to enforce a replication mapping where all data on A is
> replicated to the same set of replicas of A, because the point of vnodes is
> to give A many distinct replicas so when you replace A, it can replicate
> from "many" other sources (maybe a dozen, maybe a hundred). This was super
> important before 4.0, because each replication stream was single threaded
> by SENDER, so vnodes let you use more than 2-3 cores to re-replicate (in
> 4.0, it's still single threaded, but we avoid a lot of deserialization so
> we can saturate a nic with only a few cores, that was much harder to do
> before).
>
>
> On Mon, Nov 8, 2021 at 1:44 PM Tech Id <tech.login....@gmail.com> wrote:
>
>>
>> Hello,
>>
>> Going through
>> https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archDataDistributeDistribute.html
>> .
>>
>> But it is not clear how a node decides where each of its vnodes will be
>> replicated to.
>>
>> As an example from the above page:
>>
>>    1. Why is vnode A present in nodes 1,2 and 5
>>    2. BUT vnode B is present in nodes 1,4 and 6
>>
>>
>> I realize that the diagram is for illustration purposes only, but the
>> idea being conveyed should nevertheless be the same as I suggested above.
>>
>> So how come node 1 decides to put A on itself, 2 and 5 but put B on
>> itself, 4 and 6 ?
>> Shouldn't there be consistency here such that all vnodes present on A are
>> replicated to same set of other nodes?
>>
>> Any clarifications on that would be appreciated.
>>
>> I also understand that different vnodes are replicated to different nodes
>> for performance.
>> But all I want to know is the algorithm that it uses to put them on
>> different nodes.
>>
>> Thanks!
>>
>>

Reply via email to