https://stackoverflow.com/questions/48776589/cassandra-cant-one-use-snapshots-to-rapidly-scale-out-a-cluster/48778179#48778179
So the basic question is, if one records tokens and snapshots from an existing node, via: nodetool ring | grep ip_address_of_node | awk '{print $NF ","}' | xargs for the desired node IP then takes snapshots then transfers the snapshots to a new node (not yet attached to cluster) sets up initial_tokens in the yaml sets up schema to match then has it join the cluster Would that allow quick scaleup of nodes/replication of data? I don't care if the vnode map changes after the initial join, or data starts being streamed off as it rebalances, as the cluster Is there an issue if the vnodes tokens for two nodes are identical? Do they have to be distinct for each node? Is it that it mucks with the RF since there will be a greater RF than normal? Is this just not that practically faster than an sstable load? Basically, I was wondering if we just use this to double the number of nodes with identical copies of the node data via snapshots, and then later on cassandra can pare down which nodes own which data.