There have been some discussion on this topic in this mailing list,
including a paper from Netflix with the impact of vnodes. I could not find
it quickly, but I invite you to check.

To share some ideas:

More vnodes:
+ Better balance between nodes
+ maximize the streaming throughput for operations as all nodes share a
small bit of the data of all the other nodes (according to the topology).
- When the cluster fails, there is more chance to lose availability as we
256 vnodes for example, 2 nodes down in distinct racks would for sure make
data partially unavailable.
- Overheads / Operational issues (in practice, using 256 vnodes have been a
nightmare for multiple reasons, see below)


Less vnodes
- Imbalances can be big before C* 3.0. After, using
allocate_tokens_for_keyspace -->
http://cassandra.apache.org/doc/4.0/configuration/cassandra_config_file.html#allocate-tokens-for-keyspace,
you can mitigate this issue. With this and some technics*, you can have
good results in terms of balances.
* Off the top of my head: this involve bootstrapping the seeds first,
picking the tokens to use, create your keyspace then adding nodes with the
option above. You can test it quite easily. Then with "nodetool status
<keyspace" you can check the ownership balance.
- The streaming throughput is generally limited by the receiving host when
using vnodes, thus 16 vnodes is probably not worse than 256 in terms of
streaming
+ The other way around, the overhead of having 256 vnodes makes operations
such as repair almost impossible, or at least way longer and complex.
Repairing tables almost empty can take up to minutes and repairing big
dataset might never end.
+ In Netflix paper about this topic (very interesting, I recommend
reading), it is explained that reducing the number of vnodes reduces the
chances of an outage.
+ There was a discussion in the dev mailing list. I believe the community
agreed on the need to reduce the number of vnodes by default. Here again,
you can have a quick look at the archive, Jira, github/trunk.

I think that commonly accepted values would be 16/32. Values as low as 4
are considered to improve availability, reduce overheads induced by vnodes.
I would suggest you test it and see if low values you still manage to keep
the balance between nodes.

Also using "physical" nodes (initial_token, no vnodes) gives the
possibility to reason about token distribution. You can perform
advanced operations
where you bootstrap 1/3 of the cluster at once. This is very good
especially for big clusters, I would say. While with many vnodes you'll
have to go add a node at the time as each node is actually the 'neighbor'
of all the others (according to the topology again - ie racks/data
centers...).

I would stay away from the default in this case (256 vnodes). I think this
value is way too high by default.

Also, keep in mind that to change the number of vnodes cannot be changed in
a running cluster. The best way to change it is to add a new data center I
think.

C*heers,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


Le lun. 21 janv. 2019 à 11:08, VICTOR IBARRA <vic2...@gmail.com> a écrit :

>
> Good morning every one,
>
> I would like have a contact with the cassandra community for the questions
> of cluster configuration
>
> Today i have many questions and differents projets about the configuration
> of cluster cassandra and with the general problems of configuration
> migration and for the use of vnodes.
>
> and the principal question is what about the gain to use 256 vnodes vs 16
> vnodes for example
>
> Best regards
> --
>  L'integrité de ce message n'étant pas assurée sur internet, VICTOR IBARRA
> ne peut être tenue responsable de son contenu en ce compris les pièces
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
> vous n'êtes pas destinataire de ce message, merci de le  détruire et
> d'avertir l'expéditeur.
>
>  The integrity of this message cannot be guaranteed on the Internet.
> VICTOR IBARRA can not therefore be considered liable for the  contents
> including its attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of  this message, then
> please delete it and notify the sender.
>

Reply via email to