Have a look at this: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
The vnodes mechanism is there to provide better scalability as new nodes are added/removed, by allowing a single node to own several small chunks of the token range. Aside from that, the process is exactly the same as in the single node case, the coordinator calculates the token based on partition key and locates the responsible node in the same way. SSTables are located on the node’s disk per Cassandra table, no reference to vnodes at all. The term virtual nodes is a bit misleading in that sense. Actually, Cassandra does have a total number of vnodes per cluster. Its set with the num_tokens parameter in the Cassandra.yaml. Alec From: Sergi Vladykin [mailto:sergi.vlady...@gmail.com] Sent: Friday, 25 December 2015 8:31 AM To: user@cassandra.apache.org Subject: Re: Data rebalancing algorithm Thanks a lot for your answers! Paulo, I'll take a look at classes you've suggested. Jack, the link you've provided lacks description on how virtual nodes are mapped to phisical sstables/indexes on disk. To be more exact, I have the following better detailed questions: 1. How vnodes are mapped to sstables and indexes? Is one vnode a separate part of the sstable or all the data from all vnodes just mixed in SSTable or may be something else? 2. As far as I see Cassandra does not have predefined constant total number of vnodes for the whole cluster, right? Does it mean that on rebalancing some parts of data already mapped to some vnodes will be remapped to new vnodes on the new node? 3. How long can take the rebalancing if we have lets say 1TB of data on a single node and we are adding one more node to the cluster? Sergi 2015-12-24 19:26 GMT+03:00 Jack Krupansky <jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>>: Read details here: https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html -- Jack Krupansky On Thu, Dec 24, 2015 at 11:09 AM, Paulo Motta <pauloricard...@gmail.com<mailto:pauloricard...@gmail.com>> wrote: The new node will own some parts (ranges) of the ring according to the ring tokens the node is responsible for. These tokens are defined from the yaml property initial_token (manual assignment) or num_tokens (random assignment). During the bootstrap process raw data from sstables sections containing the ranges the node is responsible for are transferred from nodes that previously owned the range to the new node so the source sstables are rebuilt in the joining node. After each sstable is transferred the new node it rebuilds primary and secondary indexes, bloom filters, etc and in the end of the bootstrap process the new sstables are added to the live data set. See org.apache.cassandra.dht.BootStrapper.java and org.apache.cassandra.streaming.StreamReceiveTask of the trunk branch for more information. ps: I don't particularly recall any document with specific details, so if anyone knows please be welcome to share. If you want more theoretical information, see the ring membership sections of the cassandra and/or dynamo paper. 2015-12-24 13:14 GMT-02:00 Sergi Vladykin <sergi.vlady...@gmail.com<mailto:sergi.vlady...@gmail.com>>: Guys, I was not able to find in docs or in google detailed description of data rebalancing algorithm. I mean how Cassandra moves SSTables when new node connects to the cluster, how primary and secondary indexes are getting transfered to this new node, etc.. Can anyone provide relevant links please or just reply here? I can read source code of course, but it would be nice if someone could answer right away :) Sergi This email, including any attachments, is confidential. If you are not the intended recipient, you must not disclose, distribute or use the information in this email in any way. If you received this email in error, please notify the sender immediately by return email and delete the message. Unless expressly stated otherwise, the information in this email should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product or service, an official confirmation of any transaction, or as an official statement of the entity sending this message. Neither Macquarie Group Limited, nor any of its subsidiaries, guarantee the integrity of any emails or attached files and are not responsible for any changes made to them by any other person.