Hi Praveen, how is this going ?

I have been out for a while, did you manage to remove the nodes ? Do you
need more help ? If so, I could use a status update and more information
about the remaining issues.

C*heers,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-04 19:39 GMT+01:00 Peddi, Praveen <pe...@amazon.com>:

> Hi Jack,
> My answers below…
>
> What is the exact exception you are getting and where do you get it? Is it
> UnavailableException or NoHostAvailableException and does it occur on the
> client, using the Java driver?
>
> We saw different types of exceptions. One I could quickly grep are:
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
> timeout during write query at consistency SERIAL (2 replica were required
> but only 1 acknowledged the write)
> com.datastax.driver.core.exceptions.UnavailableException: Not enough
> replica available for query at consistency QUORUM (2 required but only 1
> alive)
> QueryTimeoutException
>
>
>
> What is your LoadBalancingPolicy?
>
> new TokenAwarePolicy(new RoundRobinPolicy()))
>
>
> What consistency level is the client using?
>
> QUORUM for reads. For writes some APIs use SERIAL and some use QUORUM
> dependending if we want to do optimistic locking,
>
>
> What retry policy is the client using?
>
> Default Retry Policy
>
>
> When you say that the failures don't last for more than a few minutes, you
> mean from the moment you perform the nodetool removenode? And is operation
> completely normal after those few minutes?
>
> That is correct. All operations recover from failures after few minutes.
>
>
>
> -- Jack Krupansky
>
> On Thu, Mar 3, 2016 at 4:40 PM, Peddi, Praveen <pe...@amazon.com> wrote:
>
>> Hi Jack,
>>
>> Which node(s) were getting the HostNotAvailable errors - all nodes for
>> every query, or just a small portion of the nodes on some queries?
>>
>> Not all read/writes are failing with Unavalable or Timeout exception.
>> Writes failures were around 10% of total calls. Reads were little worse (as
>> worse as 35% of total calls).
>>
>>
>> It may take some time for the gossip state to propagate; maybe some of it
>> is corrupted or needs a full refresh.
>>
>> Were any of the seed nodes in the collection of nodes that were removed?
>> How many seed nodes does each node typically have?
>>
>> We currently use all hosts as seed hosts which I know is a very bad idea
>> and we are going to fix that soon. The reason we use all hosts as seed
>> hosts is because these hosts can get recycled for many reasons and we
>> didn’t want to hard code the host names so we programmatically get host
>> names (we wrote our own seed host provider). Could that be the reason for
>> these failures? If a dead node is in the seed nodes list and we try to
>> remove that node, could that lead to blip of failures. The failures don’t
>> last for more than few minutes.
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Mar 3, 2016 at 4:16 PM, Peddi, Praveen <pe...@amazon.com> wrote:
>>
>>> Thanks Alain for quick and detailed response. My answers inline. One
>>> thing I want to clarify is, the nodes got recycled due to some automatic
>>> health check failure. This means old nodes are dead and new nodes got added
>>> w/o our intervention. So replacing nodes would not work for us since the
>>> new nodes were already added.
>>>
>>>
>>>
>>>> We are not removing multiple nodes at the same time. All dead nodes are
>>>> from same AZ so there were no errors when the nodes were down as expected
>>>> (because we use QUORUM)
>>>
>>>
>>> Do you use at leat 3 distinct AZ ? If so, you should indeed be fine
>>> regarding data integrity. Also repair should then work for you. If you have
>>> less than 3 AZ, then you are in troubles...
>>>
>>> Yes we use 3 distinct AZs and replicate to all 3 Azs which is why when 8
>>> nodes were recycled, there were absolutely no outage on Cassandra (other
>>> two nodes wtill satisfy quorum consistency)
>>>
>>>
>>> About the unreachable errors, I believe it can be due to the overload
>>> due to the missing nodes. Pressure on the remaining node might be too
>>> strong.
>>>
>>> It is certainly possible but we have beefed up cluster with <3% CPU,
>>> hardly any network I/o and disk usage. We have 162 nodes in the cluster and
>>> each node doesn’t have more than 80 to 100MB of data.
>>>
>>>
>>>
>>> However, As soon as I started removing nodes one by one, every time time
>>>> we see lot of timeout and unavailable exceptions which doesn’t make any
>>>> sense because I am just removing a node that doesn’t even exist.
>>>>
>>>
>>> This probably added even more load, if you are using vnodes, all the
>>> remaining nodes probably started streaming data to each other node at the
>>> speed of "nodetool getstreamthroughput". AWS network isn't that good, and
>>> is probably saturated. Also have you the phi_convict_threshold configured
>>> to a high value at least 10 or 12 ? This would avoid nodes to be marked
>>> down that often.
>>>
>>> We are using c3.2xlarge which has good network throughput (1GB/sec I
>>> think). We are using default value which is 200MB/sec in 2.0.9. We will
>>> play with it in future and see if this could make any difference but as I
>>> mentioned the data size on each node is not huge.
>>> Regarding phi_convict_threshold, our Cassandra is not bringing itself
>>> down. There was a bug in health check from one of our internal tool and
>>> that tool is recycling the nodes. Nothing to do with Cassandra health.
>>> Again we will keep an eye on it in future.
>>>
>>>
>>> What does "nodetool tpstats" outputs ?
>>>
>>> Nodetool tpstats on which node? Any node?
>>>
>>>
>>> Also you might try to monitor resources and see what happens (my guess
>>> is you should focus at iowait, disk usage and network, have an eye at cpu
>>> too).
>>>
>>> We did monitor cpu, disk and network and they are all very low.
>>>
>>>
>>> A quick fix would probably be to hardly throttle the network on all the
>>> nodes and see if it helps:
>>>
>>> nodetool setstreamthroughput 2
>>>
>>> We will play with this config. 2.0.9 defaults to 200MB/sec which I think
>>> is too high.
>>>
>>>
>>> If this work, you could incrementally increase it and monitor, find the
>>> good tuning and put it the cassandra.yaml.
>>>
>>> I opened a ticket a while ago about that issue:
>>> https://issues.apache.org/jira/browse/CASSANDRA-9509
>>>
>>> I voted for this issue. Lets see if it gets picked up :).
>>>
>>>
>>> I hope this will help you to go back to a healthy state allowing you a
>>> fast upgrade ;-).
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2016-03-02 22:17 GMT+01:00 Peddi, Praveen <pe...@amazon.com>:
>>>
>>>> Hi Robert,
>>>> Thanks for your response.
>>>>
>>>> Replication factor is 3.
>>>>
>>>> We are in the process of upgrading to 2.2.4. We have had too many
>>>> performance issues with later versions of Cassandra (I have asked asked for
>>>> help related to that in the forum). We are close to getting to similar
>>>> performance now and hopefully upgrade in next few weeks. Lot of testing to
>>>> do :(.
>>>>
>>>> We are not removing multiple nodes at the same time. All dead nodes are
>>>> from same AZ so there were no errors when the nodes were down as expected
>>>> (because we use QUORUM). However, As soon as I started removing nodes one
>>>> by one, every time time we see lot of timeout and unavailable exceptions
>>>> which doesn’t make any sense because I am just removing a node that doesn’t
>>>> even exist.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Robert Coli <rc...@eventbrite.com>
>>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> Date: Wednesday, March 2, 2016 at 2:52 PM
>>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> Subject: Re: Removing Node causes bunch of HostUnavailableException
>>>>
>>>> On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen <pe...@amazon.com>
>>>> wrote:
>>>>
>>>>> We have few dead nodes in the cluster (Amazon ASG removed those
>>>>> thinking there is an issue with health). Now we are trying to remove those
>>>>> dead nodes from the cluster so that other nodes can take over. As soon as 
>>>>> I
>>>>> execute nodetool removenode <ID>, we see lots of HostUnavailableExceptions
>>>>> both on reads and writes. What I am not able to understand is, these are
>>>>> deadnodes and don’t even physically exists. Why would removenode command
>>>>> cause any outage of nodes in Cassandra when we had no errors whatsoever
>>>>> before removing them. I could not really find a jira ticket for this.
>>>>>
>>>>
>>>> What is your replication factor?
>>>>
>>>> Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP.
>>>>
>>>> Also, removing multiple nodes with removenode means your consistency is
>>>> pretty hosed. Repair ASAP, but there are potential cases where repair won't
>>>> help.
>>>>
>>>> =Rob
>>>>
>>>>
>>>> =Rob
>>>>
>>>>
>>>
>>>
>>
>

Reply via email to