is this understanding correct "we had a 12 node cluster with 256 vnodes on each node (upgraded from 1.1), we added two additional nodes that streamed so much data (600+Gb when other nodes had 150-200GB) during the joining phase that they filled their local disks and had to be killed" ?
Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update the thread with the ticket number. Can you show the output from nodetool status so we can get a feel for the ring? Can you include the logs from one of the nodes that failed to join ? Thanks ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 29/04/2013, at 10:01 AM, John Watson <[email protected]> wrote: > On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <[email protected]> wrote: >> We're going to try running a shuffle before adding a new node again... maybe >> that will help > > I don't think hurt but I doubt it will help. > > We had to bail on shuffle since we need to add capacity ASAP and not in 20 > days. > > >>> It seems when new nodes join, they are streamed *all* sstables in the >>> cluster. > >> > > How many nodes did you join, what was the num_tokens ? > Did you notice streaming from all nodes (in the logs) or are you saying this > in response to the cluster load increasing ? > > > Was only adding 2 nodes at the time (planning to add a total of 12.) Starting > with a cluster of 12, but now 11 since 1 node entered some weird state when > one of the new nodes ran out disk space. > num_tokens is set to 256 on all nodes. > Yes, nearly all current nodes were streaming to the new ones (which was great > until disk space was an issue.) >>> The purple line machine, I just stopped the joining process because the >>> main cluster was dropping mutation messages at this point on a few nodes >>> (and it still had dozens of sstables to stream.) > Which were the new nodes ? > Can you show the output from nodetool status? > > > The new nodes are the purple and gray lines above all the others. > > nodetool status doesn't show joining nodes. I think I saw a bug already filed > for this but I can't seem to find it. > > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 27/04/2013, at 9:35 AM, Bryan Talbot <[email protected]> wrote: > >> I believe that "nodetool rebuild" is used to add a new datacenter, not just >> a new host to an existing cluster. Is that what you ran to add the node? >> >> -Bryan >> >> >> >> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <[email protected]> wrote: >> Small relief we're not the only ones that had this issue. >> >> We're going to try running a shuffle before adding a new node again... maybe >> that will help >> >> - John >> >> >> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral >> <[email protected]> wrote: >> I am using the same version and observed something similar. >> >> I've added a new node, but the instructions from Datastax did not work for >> me. Then I ran "nodetool rebuild" on the new node. After finished this >> command, it contained two times the load of the other nodes. Even when I ran >> "nodetool cleanup" on the older nodes, the situation was the same. >> >> The problem only seemed to disappear when "nodetool repair" was applied to >> all nodes. >> >> Regards, >> Francisco Sobral. >> >> >> >> >> On Apr 25, 2013, at 4:57 PM, John Watson <[email protected]> wrote: >> >>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running >>> upgradesstables, I figured it would be safe to start adding nodes to the >>> cluster. Guess not? >>> >>> It seems when new nodes join, they are streamed *all* sstables in the >>> cluster. >>> >>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png >>> >>> The gray the line machine ran out disk space and for some reason cascaded >>> into errors in the cluster about 'no host id' when trying to store hints >>> for it (even though it hadn't joined yet). >>> The purple line machine, I just stopped the joining process because the >>> main cluster was dropping mutation messages at this point on a few nodes >>> (and it still had dozens of sstables to stream.) >>> >>> I followed this: >>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes >>> >>> Is there something missing in that documentation? >>> >>> Thanks, >>> >>> John >> >> >> > >
