Have a look at this:
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

The vnodes mechanism is there to provide better scalability as new nodes are 
added/removed, by allowing a single node to own several small chunks of the 
token range.

Aside from that, the process is exactly the same as in the single node case, 
the coordinator calculates the token based on partition key and locates the 
responsible node in the same way. SSTables are located on the node’s disk per 
Cassandra table, no reference to vnodes at all. The term virtual nodes is a bit 
misleading in that sense.

Actually, Cassandra does have a total number of vnodes per cluster. Its set 
with the num_tokens parameter in the Cassandra.yaml.

Alec

From: Sergi Vladykin [mailto:sergi.vlady...@gmail.com]
Sent: Friday, 25 December 2015 8:31 AM
To: user@cassandra.apache.org
Subject: Re: Data rebalancing algorithm

Thanks a lot for your answers!
Paulo, I'll take a look at classes you've suggested.
Jack, the link you've provided lacks description on how virtual nodes are 
mapped to phisical sstables/indexes on disk.
To be more exact, I have the following better detailed questions:

1. How vnodes are mapped to sstables and indexes? Is one vnode a separate part 
of the sstable or all the data from all vnodes just mixed in SSTable or may be 
something else?

2. As far as I see Cassandra does not have predefined constant total number of 
vnodes for the whole cluster, right? Does it mean that on rebalancing some 
parts of data already mapped to some vnodes will be remapped to new vnodes on 
the new node?
3. How long can take the rebalancing if we have lets say 1TB of data on a 
single node and we are adding one more node to the cluster?

Sergi


2015-12-24 19:26 GMT+03:00 Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>>:
Read details here:
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html


-- Jack Krupansky

On Thu, Dec 24, 2015 at 11:09 AM, Paulo Motta 
<pauloricard...@gmail.com<mailto:pauloricard...@gmail.com>> wrote:
The new node will own some parts (ranges) of the ring according to the ring 
tokens the node is responsible for. These tokens are defined from the yaml 
property initial_token (manual assignment) or num_tokens (random assignment).

During the bootstrap process raw data from sstables sections containing the 
ranges the node is responsible for are transferred from nodes that previously 
owned the range to the new node so the source sstables are rebuilt in the 
joining node. After each sstable is transferred the new node it rebuilds 
primary and secondary indexes, bloom filters, etc and in the end of the 
bootstrap process the new sstables are added to the live data set.
See org.apache.cassandra.dht.BootStrapper.java and 
org.apache.cassandra.streaming.StreamReceiveTask of the trunk branch for more 
information.
ps: I don't particularly recall any document with specific details, so if 
anyone knows please be welcome to share. If you want more theoretical 
information, see the ring membership sections of the cassandra and/or dynamo 
paper.


2015-12-24 13:14 GMT-02:00 Sergi Vladykin 
<sergi.vlady...@gmail.com<mailto:sergi.vlady...@gmail.com>>:
Guys,
I was not able to find in docs or in google detailed description of data 
rebalancing algorithm.
I mean how Cassandra moves SSTables when new node connects to the cluster, how
primary and secondary indexes are getting transfered to this new node, etc..

Can anyone provide relevant links please or just reply here?
I can read source code of course, but it would be nice if someone could answer 
right away :)

Sergi




This email, including any attachments, is confidential. If you are not the 
intended recipient, you must not disclose, distribute or use the information in 
this email in any way. If you received this email in error, please notify the 
sender immediately by return email and delete the message. Unless expressly 
stated otherwise, the information in this email should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product or 
service, an official confirmation of any transaction, or as an official 
statement of the entity sending this message. Neither Macquarie Group Limited, 
nor any of its subsidiaries, guarantee the integrity of any emails or attached 
files and are not responsible for any changes made to them by any other person.

Reply via email to