Hi -

We are running a small 3-node cassandra cluster on Google Compute Engine.
I notice that our machines are reporting (via a collectd agent, confirmed
by `top`) a significant amount of cpu time in the NICE state.  For example,
one of our machines is a n1-highmem-4 (4 cores, 26 GB RAM).  Here is the
cpu line from to, just now:

%Cpu(s): 10.3 us, 15.7 sy, 22.8 ni, 44.4 id,  5.3 wa,  0.0 hi,  1.6 si,
 0.0 st

So the cpus are spending >20% of the time in NICE, which seems strange.

Now I have no direct evidence that Cassandra has anything to do with this,
but we have several other backend services that run on other nodes and none
of them have shown any significant NICE usage at all.  I have tried
searching for NICE processes to see if one of them is the source, but I
can't find anything that looks viable.  For example, this command (which I
*think* is correct, but please sanity check me) shows that none of the
processes with a negative priority have noticeable cpu usage.

$ ps -eo nice,pcpu,pid,args | grep '^\s*\-[1-9]'
-20  0.0     5 [kworker/0:0H]
-20  0.0    15 [kworker/1:0H]
-20  0.0    20 [kworker/2:0H]
-20  0.0    25 [kworker/3:0H]
-20  0.0    26 [khelper]
-20  0.0    28 [netns]
-20  0.0    29 [writeback]
-20  0.0    32 [kintegrityd]
-20  0.0    33 [bioset]
-20  0.0    34 [crypto]
-20  0.0    35 [kblockd]
-20  0.0    47 [kthrotld]
-20  0.0    48 [ipv6_addrconf]
-20  0.0    49 [deferwq]
-20  0.0   162 [scsi_tmf_0]
-20  0.0   179 [kworker/0:1H]
-20  0.0   197 [ext4-rsv-conver]
-20  0.0   214 [kworker/1:1H]
-20  0.0   480 [kworker/3:1H]
-20  0.0   481 [kworker/2:1H]
-20  0.0  1421 [ext4-rsv-conver]

By comparison, here is the cassandra process (just to verify that pcpu is
showing real values):

$ ps -eo nice,pcpu,pid,args | grep cassandra
  0  217  2498 java -ea
-javaagent:/mnt/data-1/mn/cassandra/bin/../lib/jamm-0.2.5.jar [...]

At this point I'm a bit stumped...  any ideas?

Cheers,
Ian

Reply via email to