Sure, it's called "Cassandra Availability with Virtual Nodes”, by Joey
Lynch and Josh Snyder.

I found it in the mailing list archives:
https://github.com/jolynch/python_performance_toolkit/blob/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf

There are some maths in there to explain impacts of the number of vnodes on
availability.

Using the formula "1d", and considering a datacenter of 3 balanced racks
with RF = 3, we have:

Np*(1-(1-(1/Np))^(v*2*(R-1)) = 40*(1-(1-(1/40))^(256*2*(3-1)) =
39.9999999998
Thus if my calculation is accurate, with 60 nodes and 256 vnodes, we expect
a node to have 39.9999999998 neighbors. This means that with 60 nodes, *each
node* has 40 *possible* replicas (all the nodes in other racks) and will be
sharing a token range with all the other nodes. Thus 2 nodes down in
distinct racks and you have an outage almost ensured (still needs 2 nodes
down).

Some other arbitrary numbers that show the evolution of this value
depending on the number of nodes and vnodes.

- With 60 nodes and 256 vnodes, expect 39.9999999998 neighbors
- With 60 nodes and 16 vnodes, expect 32.0867407145 neighbors
- With 60 nodes and 4 vnodes, expect 13.323193263 neighbors

- With 300 nodes and 256 vnodes, expect 198.8200470802 neighbors
- With 300 nodes and 16 vnodes, expect 54.8867183963 neighbors
- With 300 nodes and 4 vnodes, expect 15.4137752052 neighbors


Good reading :).

C*heers,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le lun. 21 janv. 2019 à 13:30, VICTOR IBARRA <vic2...@gmail.com> a écrit :

> Hi Alain ,
>
>  thank you very much for the explication and the points for the sujet of
> managing de vnodes
>
> you talk about the paper of netflix and the outage ?  you have the link
> with this discution
>
> thank you for your help
> BEST REGARDS
>
> Le lun. 21 janv. 2019 à 13:53, Alain RODRIGUEZ <arodr...@gmail.com> a
> écrit :
>
>> There have been some discussion on this topic in this mailing list,
>> including a paper from Netflix with the impact of vnodes. I could not find
>> it quickly, but I invite you to check.
>>
>> To share some ideas:
>>
>> More vnodes:
>> + Better balance between nodes
>> + maximize the streaming throughput for operations as all nodes share a
>> small bit of the data of all the other nodes (according to the topology).
>> - When the cluster fails, there is more chance to lose availability as we
>> 256 vnodes for example, 2 nodes down in distinct racks would for sure make
>> data partially unavailable.
>> - Overheads / Operational issues (in practice, using 256 vnodes have been
>> a nightmare for multiple reasons, see below)
>>
>>
>> Less vnodes
>> - Imbalances can be big before C* 3.0. After, using
>> allocate_tokens_for_keyspace -->
>> http://cassandra.apache.org/doc/4.0/configuration/cassandra_config_file.html#allocate-tokens-for-keyspace,
>> you can mitigate this issue. With this and some technics*, you can have
>> good results in terms of balances.
>> * Off the top of my head: this involve bootstrapping the seeds first,
>> picking the tokens to use, create your keyspace then adding nodes with the
>> option above. You can test it quite easily. Then with "nodetool status
>> <keyspace" you can check the ownership balance.
>> - The streaming throughput is generally limited by the receiving host
>> when using vnodes, thus 16 vnodes is probably not worse than 256 in terms
>> of streaming
>> + The other way around, the overhead of having 256 vnodes makes
>> operations such as repair almost impossible, or at least way longer and
>> complex. Repairing tables almost empty can take up to minutes and repairing
>> big dataset might never end.
>> + In Netflix paper about this topic (very interesting, I recommend
>> reading), it is explained that reducing the number of vnodes reduces the
>> chances of an outage.
>> + There was a discussion in the dev mailing list. I believe the community
>> agreed on the need to reduce the number of vnodes by default. Here again,
>> you can have a quick look at the archive, Jira, github/trunk.
>>
>> I think that commonly accepted values would be 16/32. Values as low as 4
>> are considered to improve availability, reduce overheads induced by vnodes.
>> I would suggest you test it and see if low values you still manage to keep
>> the balance between nodes.
>>
>> Also using "physical" nodes (initial_token, no vnodes) gives the
>> possibility to reason about token distribution. You can perform advanced 
>> operations
>> where you bootstrap 1/3 of the cluster at once. This is very good
>> especially for big clusters, I would say. While with many vnodes you'll
>> have to go add a node at the time as each node is actually the 'neighbor'
>> of all the others (according to the topology again - ie racks/data
>> centers...).
>>
>> I would stay away from the default in this case (256 vnodes). I think
>> this value is way too high by default.
>>
>> Also, keep in mind that to change the number of vnodes cannot be changed
>> in a running cluster. The best way to change it is to add a new data center
>> I think.
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - al...@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>> Le lun. 21 janv. 2019 à 11:08, VICTOR IBARRA <vic2...@gmail.com> a
>> écrit :
>>
>>>
>>> Good morning every one,
>>>
>>> I would like have a contact with the cassandra community for the
>>> questions of cluster configuration
>>>
>>> Today i have many questions and differents projets about the
>>> configuration of cluster cassandra and with the general problems of
>>> configuration migration and for the use of vnodes.
>>>
>>> and the principal question is what about the gain to use 256 vnodes vs
>>> 16 vnodes for example
>>>
>>> Best regards
>>> --
>>>  L'integrité de ce message n'étant pas assurée sur internet, VICTOR
>>> IBARRA ne peut être tenue responsable de son contenu en ce compris les
>>> pièces jointes. Toute utilisation ou diffusion non autorisée est interdite.
>>> Si vous n'êtes pas destinataire de ce message, merci de le  détruire et
>>> d'avertir l'expéditeur.
>>>
>>>  The integrity of this message cannot be guaranteed on the Internet.
>>> VICTOR IBARRA can not therefore be considered liable for the  contents
>>> including its attachments. Any unauthorized use or dissemination is
>>> prohibited. If you are not the intended recipient of  this message, then
>>> please delete it and notify the sender.
>>>
>>
>
> --
>  L'integrité de ce message n'étant pas assurée sur internet, VICTOR IBARRA
> ne peut être tenue responsable de son contenu en ce compris les pièces
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
> vous n'êtes pas destinataire de ce message, merci de le  détruire et
> d'avertir l'expéditeur.
>
>  The integrity of this message cannot be guaranteed on the Internet.
> VICTOR IBARRA can not therefore be considered liable for the  contents
> including its attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of  this message, then
> please delete it and notify the sender.
>

Reply via email to