Hi Praveen, how is this going ? I have been out for a while, did you manage to remove the nodes ? Do you need more help ? If so, I could use a status update and more information about the remaining issues.
C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-03-04 19:39 GMT+01:00 Peddi, Praveen <pe...@amazon.com>: > Hi Jack, > My answers below… > > What is the exact exception you are getting and where do you get it? Is it > UnavailableException or NoHostAvailableException and does it occur on the > client, using the Java driver? > > We saw different types of exceptions. One I could quickly grep are: > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra > timeout during write query at consistency SERIAL (2 replica were required > but only 1 acknowledged the write) > com.datastax.driver.core.exceptions.UnavailableException: Not enough > replica available for query at consistency QUORUM (2 required but only 1 > alive) > QueryTimeoutException > > > > What is your LoadBalancingPolicy? > > new TokenAwarePolicy(new RoundRobinPolicy())) > > > What consistency level is the client using? > > QUORUM for reads. For writes some APIs use SERIAL and some use QUORUM > dependending if we want to do optimistic locking, > > > What retry policy is the client using? > > Default Retry Policy > > > When you say that the failures don't last for more than a few minutes, you > mean from the moment you perform the nodetool removenode? And is operation > completely normal after those few minutes? > > That is correct. All operations recover from failures after few minutes. > > > > -- Jack Krupansky > > On Thu, Mar 3, 2016 at 4:40 PM, Peddi, Praveen <pe...@amazon.com> wrote: > >> Hi Jack, >> >> Which node(s) were getting the HostNotAvailable errors - all nodes for >> every query, or just a small portion of the nodes on some queries? >> >> Not all read/writes are failing with Unavalable or Timeout exception. >> Writes failures were around 10% of total calls. Reads were little worse (as >> worse as 35% of total calls). >> >> >> It may take some time for the gossip state to propagate; maybe some of it >> is corrupted or needs a full refresh. >> >> Were any of the seed nodes in the collection of nodes that were removed? >> How many seed nodes does each node typically have? >> >> We currently use all hosts as seed hosts which I know is a very bad idea >> and we are going to fix that soon. The reason we use all hosts as seed >> hosts is because these hosts can get recycled for many reasons and we >> didn’t want to hard code the host names so we programmatically get host >> names (we wrote our own seed host provider). Could that be the reason for >> these failures? If a dead node is in the seed nodes list and we try to >> remove that node, could that lead to blip of failures. The failures don’t >> last for more than few minutes. >> >> >> >> -- Jack Krupansky >> >> On Thu, Mar 3, 2016 at 4:16 PM, Peddi, Praveen <pe...@amazon.com> wrote: >> >>> Thanks Alain for quick and detailed response. My answers inline. One >>> thing I want to clarify is, the nodes got recycled due to some automatic >>> health check failure. This means old nodes are dead and new nodes got added >>> w/o our intervention. So replacing nodes would not work for us since the >>> new nodes were already added. >>> >>> >>> >>>> We are not removing multiple nodes at the same time. All dead nodes are >>>> from same AZ so there were no errors when the nodes were down as expected >>>> (because we use QUORUM) >>> >>> >>> Do you use at leat 3 distinct AZ ? If so, you should indeed be fine >>> regarding data integrity. Also repair should then work for you. If you have >>> less than 3 AZ, then you are in troubles... >>> >>> Yes we use 3 distinct AZs and replicate to all 3 Azs which is why when 8 >>> nodes were recycled, there were absolutely no outage on Cassandra (other >>> two nodes wtill satisfy quorum consistency) >>> >>> >>> About the unreachable errors, I believe it can be due to the overload >>> due to the missing nodes. Pressure on the remaining node might be too >>> strong. >>> >>> It is certainly possible but we have beefed up cluster with <3% CPU, >>> hardly any network I/o and disk usage. We have 162 nodes in the cluster and >>> each node doesn’t have more than 80 to 100MB of data. >>> >>> >>> >>> However, As soon as I started removing nodes one by one, every time time >>>> we see lot of timeout and unavailable exceptions which doesn’t make any >>>> sense because I am just removing a node that doesn’t even exist. >>>> >>> >>> This probably added even more load, if you are using vnodes, all the >>> remaining nodes probably started streaming data to each other node at the >>> speed of "nodetool getstreamthroughput". AWS network isn't that good, and >>> is probably saturated. Also have you the phi_convict_threshold configured >>> to a high value at least 10 or 12 ? This would avoid nodes to be marked >>> down that often. >>> >>> We are using c3.2xlarge which has good network throughput (1GB/sec I >>> think). We are using default value which is 200MB/sec in 2.0.9. We will >>> play with it in future and see if this could make any difference but as I >>> mentioned the data size on each node is not huge. >>> Regarding phi_convict_threshold, our Cassandra is not bringing itself >>> down. There was a bug in health check from one of our internal tool and >>> that tool is recycling the nodes. Nothing to do with Cassandra health. >>> Again we will keep an eye on it in future. >>> >>> >>> What does "nodetool tpstats" outputs ? >>> >>> Nodetool tpstats on which node? Any node? >>> >>> >>> Also you might try to monitor resources and see what happens (my guess >>> is you should focus at iowait, disk usage and network, have an eye at cpu >>> too). >>> >>> We did monitor cpu, disk and network and they are all very low. >>> >>> >>> A quick fix would probably be to hardly throttle the network on all the >>> nodes and see if it helps: >>> >>> nodetool setstreamthroughput 2 >>> >>> We will play with this config. 2.0.9 defaults to 200MB/sec which I think >>> is too high. >>> >>> >>> If this work, you could incrementally increase it and monitor, find the >>> good tuning and put it the cassandra.yaml. >>> >>> I opened a ticket a while ago about that issue: >>> https://issues.apache.org/jira/browse/CASSANDRA-9509 >>> >>> I voted for this issue. Lets see if it gets picked up :). >>> >>> >>> I hope this will help you to go back to a healthy state allowing you a >>> fast upgrade ;-). >>> >>> C*heers, >>> ----------------------- >>> Alain Rodriguez - al...@thelastpickle.com >>> France >>> >>> The Last Pickle - Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >>> 2016-03-02 22:17 GMT+01:00 Peddi, Praveen <pe...@amazon.com>: >>> >>>> Hi Robert, >>>> Thanks for your response. >>>> >>>> Replication factor is 3. >>>> >>>> We are in the process of upgrading to 2.2.4. We have had too many >>>> performance issues with later versions of Cassandra (I have asked asked for >>>> help related to that in the forum). We are close to getting to similar >>>> performance now and hopefully upgrade in next few weeks. Lot of testing to >>>> do :(. >>>> >>>> We are not removing multiple nodes at the same time. All dead nodes are >>>> from same AZ so there were no errors when the nodes were down as expected >>>> (because we use QUORUM). However, As soon as I started removing nodes one >>>> by one, every time time we see lot of timeout and unavailable exceptions >>>> which doesn’t make any sense because I am just removing a node that doesn’t >>>> even exist. >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: Robert Coli <rc...@eventbrite.com> >>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>>> Date: Wednesday, March 2, 2016 at 2:52 PM >>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>>> Subject: Re: Removing Node causes bunch of HostUnavailableException >>>> >>>> On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen <pe...@amazon.com> >>>> wrote: >>>> >>>>> We have few dead nodes in the cluster (Amazon ASG removed those >>>>> thinking there is an issue with health). Now we are trying to remove those >>>>> dead nodes from the cluster so that other nodes can take over. As soon as >>>>> I >>>>> execute nodetool removenode <ID>, we see lots of HostUnavailableExceptions >>>>> both on reads and writes. What I am not able to understand is, these are >>>>> deadnodes and don’t even physically exists. Why would removenode command >>>>> cause any outage of nodes in Cassandra when we had no errors whatsoever >>>>> before removing them. I could not really find a jira ticket for this. >>>>> >>>> >>>> What is your replication factor? >>>> >>>> Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP. >>>> >>>> Also, removing multiple nodes with removenode means your consistency is >>>> pretty hosed. Repair ASAP, but there are potential cases where repair won't >>>> help. >>>> >>>> =Rob >>>> >>>> >>>> =Rob >>>> >>>> >>> >>> >> >