Re: Observation on shuffling vs adding/removing nodes

aaron morton Sun, 24 Mar 2013 10:15:21 -0700

> We initially tried to run a shuffle, however it seemed to be going really 
> slowly (very little progress by watching "cassandra-shuffle ls | wc -l" after 
> 5-6 hours and no errors in logs),
My guess is that shuffle not designed to be as efficient as possible as it is 
only used once. Was it continuing to make progress?


> so we cancelled it and instead added 3 nodes to the cluster, waited for them 
> to bootstrap, and then decommissioned the first 3 nodes. 
You added 3 nodes with num_tokens set in the yaml file ?
What does nodetool status say ? 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 9:41 AM, Andrew Bialecki <andrew.biale...@gmail.com> wrote:

> Just curious if anyone has any thoughts on something we've observed in a 
> small test cluster.
> 
> We had around 100 GB of data on a 3 node cluster (RF=2) and wanted to start 
> using vnodes. We upgraded the cluster to 1.2.2 and then followed the 
> instructions for using vnodes. We initially tried to run a shuffle, however 
> it seemed to be going really slowly (very little progress by watching 
> "cassandra-shuffle ls | wc -l" after 5-6 hours and no errors in logs), so we 
> cancelled it and instead added 3 nodes to the cluster, waited for them to 
> bootstrap, and then decommissioned the first 3 nodes. Total process took 
> about 3 hours. My assumption is that the final result is the same in terms of 
> data distributed somewhat randomly across nodes now (assuming no bias in the 
> token ranges selected when bootstrapping a node).
> 
> If that assumption is correct, the observation would be, if possible, adding 
> nodes and then removing nodes appears to be a faster way to shuffle data for 
> small clusters. Obviously not always possible, but I thought I'd just throw 
> this out there in case anyone runs into a similar situation. This cluster is 
> unsurprisingly on EC2 instances, which made provisioning and shutting down 
> nodes extremely easy.
> 
> Cheers,
> Andrew

Re: Observation on shuffling vs adding/removing nodes

Reply via email to