A few misbehaving nodes

Erik Forsberg Tue, 19 Apr 2016 05:25:28 -0700

Hi!

I have this problem where 3 of my 84 nodes misbehave with too long GCtimes, leading to them being marked as DN.

This happens when I load data to them using CQL from a hadoop job, soquite a lot of inserts at a time. The CQL loading job is usingTokenAwarePolicy with fallback to DCAwareRoundRobinPolicy. Cassandrajava driver version 2.1.7.1 is in use.

My other observation is that around the time the GC starts to work likecrazy, there is a lot of outbound network traffic from the troublesomenodes. If a healthy node has around 25 Mbit/s in, 25 Mbit/s out, anunhealthy sees 25 Mbit/s in, 200 Mbit/s out.

So, something is iffy with these 3 nodes, but I have some troublefinding out exactly what makes them differ.

This is Cassandra 2.0.13 (yes, old) using vnodes. Keyspace is usingNetworkTopologyStrategy with replication 2, in one datacenter.

One thing I know I'm doing wrong is that I have slightly differingnumber of hosts in each of my 6 chassies (One of them have 15 nodes, oneof have 13, the remaining have 14). Could what I'm seeing here be theeffect of that?

Other ideas on what could be wrong? Some kind of vnode imbalance? Howcan I diagnose that? What metrics should I be looking at?


Thanks,
\EF

A few misbehaving nodes

Reply via email to