Hello, I tried kernel 139.1, same issue, 2 ksoftirqd process are taking 200% cpu forever and load is steady 2.00. I guess this server is stuck with kernel 133.2.
[root@x ~]# uname -r 2.6.32-042stab139.1 [root@x ~]# cat /proc/loadavg 2.08 2.21 2.33 3/981 29612 21 root 20 0 0 0 0 R 100.0 0.0 25:50.13 ksoftirqd/4 25 root 20 0 0 0 0 R 100.0 0.0 25:50.42 ksoftirqd/5 Karl On Sun, Jun 2, 2019 at 4:17 PM Karl Johnson <karljohnson...@gmail.com> wrote: > Quick follow-up, I tried kernel 042stab137.1 with and without nohz=off, > same issue, 3 cores taking 100% each and load at 3.00+: > > [root@core1 ~]# ps auxf|grep ksoftirqd|grep 99 > root 9 99.1 0.0 0 0 ? R 14:54 13:10 \_ > [ksoftirqd/1] > root 17 99.1 0.0 0 0 ? R 14:54 13:10 \_ > [ksoftirqd/3] > root 33 99.1 0.0 0 0 ? R 14:54 13:10 \_ > [ksoftirqd/7] > > [root@core1 ~]# cat /proc/loadavg > 3.22 3.61 2.83 4/975 20478 > > I've downgraded to 2.6.32-042stab133.2 and everything is fine, load at > 0.00, no CPU usage. There's something wrong between kernel 133.2 and 137.1. > I haven't tested them all. > > Karl > > On Fri, May 31, 2019 at 1:55 AM Vasily Averin <v...@virtuozzo.com> wrote: > >> On 5/30/19 10:39 PM, Karl Johnson wrote: >> > Hello, >> > >> > It's always related to swapper and ksoftirqd: >> "swapper" is idle thread, it is called if CPU does not have any active >> tasks >> it would be interesting to look at state of "ksoftirqd" processes several >> times, to see any changes. >> >> In provided example I see that this process was captured during >> processing of top-level function handles soft interrupts: >> do_softirq()-> call_softirq(). Usually these function handles network >> packets and I expected your example will contain more deep calltraces. >> Probably this happen next time. >> >> Anyway, these calltraces shows that CPUs are NOT 100% busy by processing >> of timer interrupts, >> so in general the situation looks like expected: in current theory >> ksoftirq processes handles network traffic. >> >> Thank you, >> Vasily Averin >> >> > Some examples here: https://pastebin.com/wn0nCwce >> > >> > Karl >> > >> > On Thu, May 30, 2019 at 3:11 PM Vasily Averin <v...@virtuozzo.com >> <mailto:v...@virtuozzo.com>> wrote: >> > >> > Dear Karl, >> > thank you for reporting the problem. >> > >> > no, it is not known issue. >> > moreover, I doubt it is related to real hardware interrupts, >> > soft-interrupts handles delayed procedures like processing of >> network packets. >> > >> > For troubleshooting is to look at stack of affected running >> processes via /proc/<pid>/stack >> > alternatively you can use magic sysrq key >> > # echo l > /proc/sysrq-trigger >> > it should dump current state of all running processors. >> > you can do it few times to monitor state of affected processes. >> > >> > Thank you, >> > Vasily Averin >> > >> > >> > On 5/30/19 7:54 PM, Karl Johnson wrote: >> > > Hello, >> > > >> > > I've upgraded from 2.6.32-042stab133.2 to 2.6.32-042stab138.1 and >> since boot, 2 cores are using 100% cpu on ksoftirqd: >> > > >> > > root 21 99.9 0.0 0 0 ? R May29 >> 1178:07 \_ [ksoftirqd/4] >> > > root 25 99.9 0.0 0 0 ? R May29 >> 1177:51 \_ [ksoftirqd/5] >> > > >> > > From /proc/interrupts I can see that it's caused by >> IR-IO-APIC-edge timer: >> > > >> > > CPU0 CPU1 CPU2 CPU3 CPU4 >> CPU5 CPU6 CPU7 >> > > 0: 136922 103603 26928 27528 112318229 >> 71888343 73755 285735 IR-IO-APIC-edge timer >> > > >> > > kernel /vmlinuz-2.6.32-042stab138.1 ro >> root=UUID=7367aa0f-8216-44ca-9cc4-affed22bbd9c rd_NO_LUKS rd_NO_LVM >> LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto >> KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM nohz=off nopti >> > > >> > > Any way to troubleshoot this? Is it a known issue? >> > > >> > > Karl >> > > >> > > >> > > _______________________________________________ >> > > Users mailing list >> > > Users@openvz.org <mailto:Users@openvz.org> >> > > https://lists.openvz.org/mailman/listinfo/users >> > > >> > >> >
_______________________________________________ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users