Hi - We are running a small 3-node cassandra cluster on Google Compute Engine. I notice that our machines are reporting (via a collectd agent, confirmed by `top`) a significant amount of cpu time in the NICE state. For example, one of our machines is a n1-highmem-4 (4 cores, 26 GB RAM). Here is the cpu line from to, just now:
%Cpu(s): 10.3 us, 15.7 sy, 22.8 ni, 44.4 id, 5.3 wa, 0.0 hi, 1.6 si, 0.0 st So the cpus are spending >20% of the time in NICE, which seems strange. Now I have no direct evidence that Cassandra has anything to do with this, but we have several other backend services that run on other nodes and none of them have shown any significant NICE usage at all. I have tried searching for NICE processes to see if one of them is the source, but I can't find anything that looks viable. For example, this command (which I *think* is correct, but please sanity check me) shows that none of the processes with a negative priority have noticeable cpu usage. $ ps -eo nice,pcpu,pid,args | grep '^\s*\-[1-9]' -20 0.0 5 [kworker/0:0H] -20 0.0 15 [kworker/1:0H] -20 0.0 20 [kworker/2:0H] -20 0.0 25 [kworker/3:0H] -20 0.0 26 [khelper] -20 0.0 28 [netns] -20 0.0 29 [writeback] -20 0.0 32 [kintegrityd] -20 0.0 33 [bioset] -20 0.0 34 [crypto] -20 0.0 35 [kblockd] -20 0.0 47 [kthrotld] -20 0.0 48 [ipv6_addrconf] -20 0.0 49 [deferwq] -20 0.0 162 [scsi_tmf_0] -20 0.0 179 [kworker/0:1H] -20 0.0 197 [ext4-rsv-conver] -20 0.0 214 [kworker/1:1H] -20 0.0 480 [kworker/3:1H] -20 0.0 481 [kworker/2:1H] -20 0.0 1421 [ext4-rsv-conver] By comparison, here is the cassandra process (just to verify that pcpu is showing real values): $ ps -eo nice,pcpu,pid,args | grep cassandra 0 217 2498 java -ea -javaagent:/mnt/data-1/mn/cassandra/bin/../lib/jamm-0.2.5.jar [...] At this point I'm a bit stumped... any ideas? Cheers, Ian