Which node(s) were getting the HostNotAvailable errors - all nodes for every query, or just a small portion of the nodes on some queries?
It may take some time for the gossip state to propagate; maybe some of it is corrupted or needs a full refresh. Were any of the seed nodes in the collection of nodes that were removed? How many seed nodes does each node typically have? -- Jack Krupansky On Thu, Mar 3, 2016 at 4:16 PM, Peddi, Praveen <pe...@amazon.com> wrote: > Thanks Alain for quick and detailed response. My answers inline. One thing > I want to clarify is, the nodes got recycled due to some automatic health > check failure. This means old nodes are dead and new nodes got added w/o > our intervention. So replacing nodes would not work for us since the new > nodes were already added. > > > >> We are not removing multiple nodes at the same time. All dead nodes are >> from same AZ so there were no errors when the nodes were down as expected >> (because we use QUORUM) > > > Do you use at leat 3 distinct AZ ? If so, you should indeed be fine > regarding data integrity. Also repair should then work for you. If you have > less than 3 AZ, then you are in troubles... > > Yes we use 3 distinct AZs and replicate to all 3 Azs which is why when 8 > nodes were recycled, there were absolutely no outage on Cassandra (other > two nodes wtill satisfy quorum consistency) > > > About the unreachable errors, I believe it can be due to the overload due > to the missing nodes. Pressure on the remaining node might be too strong. > > It is certainly possible but we have beefed up cluster with <3% CPU, > hardly any network I/o and disk usage. We have 162 nodes in the cluster and > each node doesn’t have more than 80 to 100MB of data. > > > > However, As soon as I started removing nodes one by one, every time time >> we see lot of timeout and unavailable exceptions which doesn’t make any >> sense because I am just removing a node that doesn’t even exist. >> > > This probably added even more load, if you are using vnodes, all the > remaining nodes probably started streaming data to each other node at the > speed of "nodetool getstreamthroughput". AWS network isn't that good, and > is probably saturated. Also have you the phi_convict_threshold configured > to a high value at least 10 or 12 ? This would avoid nodes to be marked > down that often. > > We are using c3.2xlarge which has good network throughput (1GB/sec I > think). We are using default value which is 200MB/sec in 2.0.9. We will > play with it in future and see if this could make any difference but as I > mentioned the data size on each node is not huge. > Regarding phi_convict_threshold, our Cassandra is not bringing itself > down. There was a bug in health check from one of our internal tool and > that tool is recycling the nodes. Nothing to do with Cassandra health. > Again we will keep an eye on it in future. > > > What does "nodetool tpstats" outputs ? > > Nodetool tpstats on which node? Any node? > > > Also you might try to monitor resources and see what happens (my guess is > you should focus at iowait, disk usage and network, have an eye at cpu too). > > We did monitor cpu, disk and network and they are all very low. > > > A quick fix would probably be to hardly throttle the network on all the > nodes and see if it helps: > > nodetool setstreamthroughput 2 > > We will play with this config. 2.0.9 defaults to 200MB/sec which I think > is too high. > > > If this work, you could incrementally increase it and monitor, find the > good tuning and put it the cassandra.yaml. > > I opened a ticket a while ago about that issue: > https://issues.apache.org/jira/browse/CASSANDRA-9509 > > I voted for this issue. Lets see if it gets picked up :). > > > I hope this will help you to go back to a healthy state allowing you a > fast upgrade ;-). > > C*heers, > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > 2016-03-02 22:17 GMT+01:00 Peddi, Praveen <pe...@amazon.com>: > >> Hi Robert, >> Thanks for your response. >> >> Replication factor is 3. >> >> We are in the process of upgrading to 2.2.4. We have had too many >> performance issues with later versions of Cassandra (I have asked asked for >> help related to that in the forum). We are close to getting to similar >> performance now and hopefully upgrade in next few weeks. Lot of testing to >> do :(. >> >> We are not removing multiple nodes at the same time. All dead nodes are >> from same AZ so there were no errors when the nodes were down as expected >> (because we use QUORUM). However, As soon as I started removing nodes one >> by one, every time time we see lot of timeout and unavailable exceptions >> which doesn’t make any sense because I am just removing a node that doesn’t >> even exist. >> >> >> >> >> >> >> From: Robert Coli <rc...@eventbrite.com> >> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Date: Wednesday, March 2, 2016 at 2:52 PM >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: Removing Node causes bunch of HostUnavailableException >> >> On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen <pe...@amazon.com> wrote: >> >>> We have few dead nodes in the cluster (Amazon ASG removed those thinking >>> there is an issue with health). Now we are trying to remove those dead >>> nodes from the cluster so that other nodes can take over. As soon as I >>> execute nodetool removenode <ID>, we see lots of HostUnavailableExceptions >>> both on reads and writes. What I am not able to understand is, these are >>> deadnodes and don’t even physically exists. Why would removenode command >>> cause any outage of nodes in Cassandra when we had no errors whatsoever >>> before removing them. I could not really find a jira ticket for this. >>> >> >> What is your replication factor? >> >> Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP. >> >> Also, removing multiple nodes with removenode means your consistency is >> pretty hosed. Repair ASAP, but there are potential cases where repair won't >> help. >> >> =Rob >> >> >> =Rob >> >> > >