Ah, thanks Andras. Running `ps -T` I do in fact seen that most Cassandra threads have priority 0 but a few are at priority 4 and these also show non-trivial pcpu. So that seems like it!
- Ian On Wed, Oct 8, 2014 at 11:56 AM, Andras Szerdahelyi < andras.szerdahe...@ignitionone.com> wrote: > Hello, > > AFAIK Compaction threads run with a lower affinity, I believe that will > show up as “niced”.. > > Regards, > Andras > > From: Ian Rose <ianr...@fullstory.com> > Reply-To: user <user@cassandra.apache.org> > Date: Wednesday 8 October 2014 17:29 > To: user <user@cassandra.apache.org> > Subject: significant NICE cpu usage > > Hi - > > We are running a small 3-node cassandra cluster on Google Compute > Engine. I notice that our machines are reporting (via a collectd agent, > confirmed by `top`) a significant amount of cpu time in the NICE state. > For example, one of our machines is a n1-highmem-4 (4 cores, 26 GB RAM). > Here is the cpu line from to, just now: > > %Cpu(s): 10.3 us, 15.7 sy, 22.8 ni, 44.4 id, 5.3 wa, 0.0 hi, 1.6 si, > 0.0 st > > So the cpus are spending >20% of the time in NICE, which seems strange. > > Now I have no direct evidence that Cassandra has anything to do with > this, but we have several other backend services that run on other nodes > and none of them have shown any significant NICE usage at all. I have > tried searching for NICE processes to see if one of them is the source, but > I can't find anything that looks viable. For example, this command (which > I *think* is correct, but please sanity check me) shows that none of the > processes with a negative priority have noticeable cpu usage. > > $ ps -eo nice,pcpu,pid,args | grep '^\s*\-[1-9]' > -20 0.0 5 [kworker/0:0H] > -20 0.0 15 [kworker/1:0H] > -20 0.0 20 [kworker/2:0H] > -20 0.0 25 [kworker/3:0H] > -20 0.0 26 [khelper] > -20 0.0 28 [netns] > -20 0.0 29 [writeback] > -20 0.0 32 [kintegrityd] > -20 0.0 33 [bioset] > -20 0.0 34 [crypto] > -20 0.0 35 [kblockd] > -20 0.0 47 [kthrotld] > -20 0.0 48 [ipv6_addrconf] > -20 0.0 49 [deferwq] > -20 0.0 162 [scsi_tmf_0] > -20 0.0 179 [kworker/0:1H] > -20 0.0 197 [ext4-rsv-conver] > -20 0.0 214 [kworker/1:1H] > -20 0.0 480 [kworker/3:1H] > -20 0.0 481 [kworker/2:1H] > -20 0.0 1421 [ext4-rsv-conver] > > By comparison, here is the cassandra process (just to verify that pcpu > is showing real values): > > $ ps -eo nice,pcpu,pid,args | grep cassandra > 0 217 2498 java -ea > -javaagent:/mnt/data-1/mn/cassandra/bin/../lib/jamm-0.2.5.jar [...] > > At this point I'm a bit stumped... any ideas? > > Cheers, > Ian > >