Can you provide some info on the number of nodes, node load, cluster load etc ?

AFAIK shuffle was not an easy thing to test and does not get much real world 
use as only some people will run it and they (normally) use it once.

Any info you can provide may help improve the process. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/04/2013, at 9:21 AM, John Watson <j...@disqus.com> wrote:

> The amount of time/space cassandra-shuffle requires when upgrading to using 
> vnodes should really be apparent in documentation (when some is made).
> 
> Only semi-noticeable remark about the exorbitant amount of time is a bullet 
> point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance
> 
> "Shuffling will entail moving a lot of data around the cluster and so has the 
> potential to consume a lot of disk and network I/O, and to take a 
> considerable amount of time. For this to be an online operation, the shuffle 
> will need to operate on a lower priority basis to other streaming operations, 
> and should be expected to take days or weeks to complete."
> 
> We tried running shuffle on a QA version of our cluster and 2 things were 
> brought to light:
>  - Even with no reads/writes it was going to take 20 days
>  - Each machine needed enough free diskspace to potentially hold the entire 
> cluster's sstables on disk
> 
> Regards,
> 
> John

Reply via email to