Can you show the output of a tpstats on one of the effected nodes? That will give some indication where the trouble might be.
Patrick On Tue, Apr 19, 2016 at 6:54 AM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > hi; > do we see any hung process like Repairs on those 3 nodes? what does > "nodetool netstats" show?? > > thanks > Sai > > On Tue, Apr 19, 2016 at 8:24 AM, Erik Forsberg <forsb...@opera.com> wrote: > >> Hi! >> >> I have this problem where 3 of my 84 nodes misbehave with too long GC >> times, leading to them being marked as DN. >> >> This happens when I load data to them using CQL from a hadoop job, so >> quite a lot of inserts at a time. The CQL loading job is using >> TokenAwarePolicy with fallback to DCAwareRoundRobinPolicy. Cassandra java >> driver version 2.1.7.1 is in use. >> >> My other observation is that around the time the GC starts to work like >> crazy, there is a lot of outbound network traffic from the troublesome >> nodes. If a healthy node has around 25 Mbit/s in, 25 Mbit/s out, an >> unhealthy sees 25 Mbit/s in, 200 Mbit/s out. >> >> So, something is iffy with these 3 nodes, but I have some trouble finding >> out exactly what makes them differ. >> >> This is Cassandra 2.0.13 (yes, old) using vnodes. Keyspace is using >> NetworkTopologyStrategy with replication 2, in one datacenter. >> >> One thing I know I'm doing wrong is that I have slightly differing number >> of hosts in each of my 6 chassies (One of them have 15 nodes, one of have >> 13, the remaining have 14). Could what I'm seeing here be the effect of >> that? >> >> Other ideas on what could be wrong? Some kind of vnode imbalance? How can >> I diagnose that? What metrics should I be looking at? >> >> Thanks, >> \EF >> >> >> >