Can you show the output of a tpstats on one of the effected nodes? That
will give some indication where the trouble might be.

Patrick

On Tue, Apr 19, 2016 at 6:54 AM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> hi;
>    do we see any hung process like Repairs on those 3 nodes?  what does
> "nodetool netstats" show??
>
> thanks
> Sai
>
> On Tue, Apr 19, 2016 at 8:24 AM, Erik Forsberg <forsb...@opera.com> wrote:
>
>> Hi!
>>
>> I have this problem where 3 of my 84 nodes misbehave with too long GC
>> times, leading to them being marked as DN.
>>
>> This happens when I load data to them using CQL from a hadoop job, so
>> quite a lot of inserts at a time. The CQL loading job is using
>> TokenAwarePolicy with fallback to DCAwareRoundRobinPolicy. Cassandra java
>> driver version 2.1.7.1 is in use.
>>
>> My other observation is that around the time the GC starts to work like
>> crazy, there is a lot of outbound network traffic from the troublesome
>> nodes. If a healthy node has around 25 Mbit/s in, 25 Mbit/s out, an
>> unhealthy sees 25 Mbit/s in, 200 Mbit/s out.
>>
>> So, something is iffy with these 3 nodes, but I have some trouble finding
>> out exactly what makes them differ.
>>
>> This is Cassandra 2.0.13 (yes, old) using vnodes. Keyspace is using
>> NetworkTopologyStrategy with replication 2, in one datacenter.
>>
>> One thing I know I'm doing wrong is that I have slightly differing number
>> of hosts in each of my 6 chassies (One of them have 15 nodes, one of have
>> 13, the remaining have 14). Could what I'm seeing here be the effect of
>> that?
>>
>> Other ideas on what could be wrong? Some kind of vnode imbalance? How can
>> I diagnose that? What metrics should I be looking at?
>>
>> Thanks,
>> \EF
>>
>>
>>
>

Reply via email to