Re: Re: Re: how to configure the Token Allocation Algorithm

Alain RODRIGUEZ Mon, 01 Oct 2018 08:06:51 -0700

Hello again :),

I thought a little bit more about this question, and I was actually
wondering if something like this would work:

Imagine 3 node cluster, and create them using:
For the 3 nodes: `num_token: 4`
Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
4611686018427387901`
Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
1537228672809129299, 6148914691236517202`
Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
3074457345618258600, 7686143364045646503`

 If you know the initial size of your cluster, you can calculate the total
number of tokens: number of nodes * vnodes and use the formula/python code
above to get the tokens. Then use the first token for the first node, move
to the second node, use the second token and repeat. In my case there is a
total of 12 tokens (3 nodes, 4 tokens each)
```
>>> number_of_tokens = 12
>>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
range(number_of_tokens)]
['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
'-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
'-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
'6148914691236517202', '7686143364045646503']
```

it actually works nicely apparently. Here is a quick ccm test I have run,
with the configuration above:

```

$ ccm node1 nodetool status tlp_lab

Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address    Load       Tokens       Owns (effective)  Host ID
                    Rack

UN  127.0.0.1  82.47 KiB  4            66.7%
1ed8680b-7250-4088-988b-e4679514322f  rack1

UN  127.0.0.2  99.03 KiB  4            66.7%
ab3655b5-c380-496d-b250-51b53efb4c00  rack1

UN  127.0.0.3  82.36 KiB  4            66.7%
ad2b343e-5f6e-4b0d-b79f-a3dfc3ba3c79  rack1
```

Ownership is perfectly distributed, like it would be without vnodes. Tested
with C* 3.11.1 and CCM.

I followed the procedure we were talking about on my second test, after
wiping out the data in my 3 nodes ccm cluster.
RF=2 for tlp_lab, the first node with initial_tokens defined and other
nodes using 'allocate_tokens_for_keyspace: tlp_lab':

$ ccm node1 nodetool status tlp_lab

Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address    Load       Tokens       Owns (effective)  Host ID
                    Rack

UN  127.0.0.1  86.71 KiB  4            96.2%
6e4c0ce0-2e2e-48ff-b7e0-3653e76366a3  rack1

UN  127.0.0.2  65.63 KiB  4            54.2%
592cda85-5807-4e7a-aa3b-0d9ae54cfaf3  rack1

UN  127.0.0.3  99.04 KiB  4            49.7%
f2c4eccc-31cc-458c-a599-5373c1169d3c  rack1

This is not as great. I guess a fourth node would help, but still not make
it as perfect.

I would still check what happens when you add a few more nodes with
'allocate_tokens_for_keyspace' afterward and without 'initial_token', not
to have any surprise.
I did not see anyone using this yet. Please take it as an idea to dig, and
not as a recommendation :).

I also noticed I did not answer the second part of the mail:

My cluster Size won't go beyond 150 nodes, should i still use The
> Allocation Algorithm instead of random with 256 tokens (performance wise or
> load-balance wise)?
>

I would say yes. There is a talk to change this default (256 vnodes), that
is now probably always a bad idea since 'allocate_tokens_for_keyspace' was
added.

Is the Allocation Algorithm, widely used and tested with Community and can
> we migrate all clusters with any size to use this Algorithm Safely?
>

Here again, I would say yes. I am not sure that it is widely used yet, but
I think so. Also, you can always check the ownership with 'nodetool status
<keyspace>' after adding the nodes, and before adding data or traffic to
this data center, so there is probably no real risk if you check ownership
distribution after adding nodes. If you don't like the distribution, you
can decommission the nodes, clean them, and try again, I use to call it
'rolling the dice' when I am still using the random algorithm :). I mean,
once the token ranges ownership are distributed to the nodes, it does not
change anything during the transaction. We don't need this 'algorithm'
after the bootstrap I would say.

> Out of Curiosity, i wonder how people (i.e, in Apple) config and maintain
> token management of clusters with thousands of nodes?
>

I am not sure about Apple, but my understanding is that some of those
companies don't use vnodes and have a 'ring management tool' to perform the
necessary 'nodetool move' around the cluster relatively easily or
automatically. Some other probably use a low number of vnodes (something
between 4 and 32) and 'allocate_tokens_for_keyspace'.

Also, my understanding is that it's very rare to have clusters with
thousands of nodes. You can then start having issues around gossip if I
remember correctly what I read/discussed. I would probably add a second
cluster when the first one is too big (hundreds of nodes) or split per
service/workflow for example. In practice, the operational complexity is
reduced by automated operations and/or having a good tooling to operate
efficiently.

Le lun. 1 oct. 2018 à 12:37, onmstester onmstester <onmstes...@zoho.com> a
écrit :

> Thanks Alex,
> You are right, that would be a mistake.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ============ Forwarded message ============
> From : Oleksandr Shulgin <oleksandr.shul...@zalando.de>
> To : "User"<user@cassandra.apache.org>
> Date : Mon, 01 Oct 2018 13:53:37 +0330
> Subject : Re: Re: how to configure the Token Allocation Algorithm
> ============ Forwarded message ============
>
> On Mon, Oct 1, 2018 at 12:18 PM onmstester onmstester <onmstes...@zoho.com>
> wrote:
>
>
>
> What if instead of running that python and having one node with non-vnode
> config, i remove the first seed node and re-add it after cluster was fully
> up ? so the token ranges of first seed node would also be assigned by
> Allocation Alg
>
>
> I think this is tricky because the random allocation of the very first
> tokens from the first seed affects the choice of tokens made by the
> algorithm on the rest of the nodes: it basically tries to divide the token
> ranges in more or less equal parts.  If your very first 8 tokens resulted
> in really bad balance, you are not going to remove that imbalance by
> removing the node, it would still have the lasting effect on the rest of
> your cluster.
>
> --
> Alex
>
>
>
>

Re: Re: Re: how to configure the Token Allocation Algorithm

Reply via email to