Wouldn't shock me if shuffle wasn't all that performant (and not knock on shuffle...our case is somewhat specific).
We added 3 nodes with num_tokens=256 and worked great, the load was evenly spread. On Sun, Mar 24, 2013 at 1:14 PM, aaron morton <aa...@thelastpickle.com>wrote: > We initially tried to run a shuffle, however it seemed to be going really > slowly (very little progress by watching "cassandra-shuffle ls | wc -l" > after 5-6 hours and no errors in logs), > > My guess is that shuffle not designed to be as efficient as possible as it > is only used once. Was it continuing to make progress? > > so we cancelled it and instead added 3 nodes to the cluster, waited for > them to bootstrap, and then decommissioned the first 3 nodes. > > You added 3 nodes with num_tokens set in the yaml file ? > What does nodetool status say ? > > Cheers > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 24/03/2013, at 9:41 AM, Andrew Bialecki <andrew.biale...@gmail.com> > wrote: > > Just curious if anyone has any thoughts on something we've observed in a > small test cluster. > > We had around 100 GB of data on a 3 node cluster (RF=2) and wanted to > start using vnodes. We upgraded the cluster to 1.2.2 and then followed the > instructions for using vnodes. We initially tried to run a shuffle, however > it seemed to be going really slowly (very little progress by watching > "cassandra-shuffle ls | wc -l" after 5-6 hours and no errors in logs), so > we cancelled it and instead added 3 nodes to the cluster, waited for them > to bootstrap, and then decommissioned the first 3 nodes. Total process took > about 3 hours. My assumption is that the final result is the same in terms > of data distributed somewhat randomly across nodes now (assuming no bias in > the token ranges selected when bootstrapping a node). > > If that assumption is correct, the observation would be, if possible, > adding nodes and then removing nodes appears to be a faster way to shuffle > data for small clusters. Obviously not always possible, but I thought I'd > just throw this out there in case anyone runs into a similar situation. > This cluster is unsurprisingly on EC2 instances, which made provisioning > and shutting down nodes extremely easy. > > Cheers, > Andrew > > >