The amount of time/space cassandra-shuffle requires when upgrading to using vnodes should really be apparent in documentation (when some is made).
Only semi-noticeable remark about the exorbitant amount of time is a bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance "Shuffling will entail moving a lot of data around the cluster and so has the potential to consume a lot of disk and network I/O, and to take a considerable amount of time. For this to be an online operation, the shuffle will need to operate on a lower priority basis to other streaming operations, and should be expected to take days or weeks to complete." We tried running shuffle on a QA version of our cluster and 2 things were brought to light: - Even with no reads/writes it was going to take 20 days - Each machine needed enough free diskspace to potentially hold the entire cluster's sstables on disk Regards, John