Quick follow-up, I tried kernel 042stab137.1 with and without nohz=off, same issue, 3 cores taking 100% each and load at 3.00+:
[root@core1 ~]# ps auxf|grep ksoftirqd|grep 99 root 9 99.1 0.0 0 0 ? R 14:54 13:10 \_ [ksoftirqd/1] root 17 99.1 0.0 0 0 ? R 14:54 13:10 \_ [ksoftirqd/3] root 33 99.1 0.0 0 0 ? R 14:54 13:10 \_ [ksoftirqd/7] [root@core1 ~]# cat /proc/loadavg 3.22 3.61 2.83 4/975 20478 I've downgraded to 2.6.32-042stab133.2 and everything is fine, load at 0.00, no CPU usage. There's something wrong between kernel 133.2 and 137.1. I haven't tested them all. Karl On Fri, May 31, 2019 at 1:55 AM Vasily Averin <v...@virtuozzo.com> wrote: > On 5/30/19 10:39 PM, Karl Johnson wrote: > > Hello, > > > > It's always related to swapper and ksoftirqd: > "swapper" is idle thread, it is called if CPU does not have any active > tasks > it would be interesting to look at state of "ksoftirqd" processes several > times, to see any changes. > > In provided example I see that this process was captured during processing > of top-level function handles soft interrupts: > do_softirq()-> call_softirq(). Usually these function handles network > packets and I expected your example will contain more deep calltraces. > Probably this happen next time. > > Anyway, these calltraces shows that CPUs are NOT 100% busy by processing > of timer interrupts, > so in general the situation looks like expected: in current theory > ksoftirq processes handles network traffic. > > Thank you, > Vasily Averin > > > Some examples here: https://pastebin.com/wn0nCwce > > > > Karl > > > > On Thu, May 30, 2019 at 3:11 PM Vasily Averin <v...@virtuozzo.com > <mailto:v...@virtuozzo.com>> wrote: > > > > Dear Karl, > > thank you for reporting the problem. > > > > no, it is not known issue. > > moreover, I doubt it is related to real hardware interrupts, > > soft-interrupts handles delayed procedures like processing of > network packets. > > > > For troubleshooting is to look at stack of affected running > processes via /proc/<pid>/stack > > alternatively you can use magic sysrq key > > # echo l > /proc/sysrq-trigger > > it should dump current state of all running processors. > > you can do it few times to monitor state of affected processes. > > > > Thank you, > > Vasily Averin > > > > > > On 5/30/19 7:54 PM, Karl Johnson wrote: > > > Hello, > > > > > > I've upgraded from 2.6.32-042stab133.2 to 2.6.32-042stab138.1 and > since boot, 2 cores are using 100% cpu on ksoftirqd: > > > > > > root 21 99.9 0.0 0 0 ? R May29 > 1178:07 \_ [ksoftirqd/4] > > > root 25 99.9 0.0 0 0 ? R May29 > 1177:51 \_ [ksoftirqd/5] > > > > > > From /proc/interrupts I can see that it's caused by > IR-IO-APIC-edge timer: > > > > > > CPU0 CPU1 CPU2 CPU3 CPU4 > CPU5 CPU6 CPU7 > > > 0: 136922 103603 26928 27528 112318229 > 71888343 73755 285735 IR-IO-APIC-edge timer > > > > > > kernel /vmlinuz-2.6.32-042stab138.1 ro > root=UUID=7367aa0f-8216-44ca-9cc4-affed22bbd9c rd_NO_LUKS rd_NO_LVM > LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto > KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM nohz=off nopti > > > > > > Any way to troubleshoot this? Is it a known issue? > > > > > > Karl > > > > > > > > > _______________________________________________ > > > Users mailing list > > > Users@openvz.org <mailto:Users@openvz.org> > > > https://lists.openvz.org/mailman/listinfo/users > > > > > >
_______________________________________________ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users