Before trying the upstream kernel, I tried to replicate the issue. After
noticing it was happening every time there are heavy file I/O. I was
able to easily reproduce it at will by running apps that do lot of file
I/O. I was also monitoring free memory every second to understand why
kernel is invoking oom-killer to randomly killing applications. When
oom-killer started to kill random applications, the memory looked like
this.

Every 1.0s: free -h         gorilla: Sat Jan 14 09:52:01 2017
              total        used        free      shared  buff/cache   available
Mem:           5.9G        755M        127M         17M        5.1G        4.6G
Swap:          2.0G          0B        2.0G

As you can see, there are lot of available memory (mostly in cache and I am 
very sure most of it are clean cache) but for some reason, it was not reclaimed 
by kernel (kswapd0?). So I decided to run "echo 3 > /proc/sys/vm/drop_caches" 
frequently to force dropping cache, and sure enough everything worked fine. 
Right now, I haven't seen this problem in the last 2+ days. 
 
root@gorilla:~# cat /var/log/syslog|egrep "NMI watchdog: BUG: soft 
lockup|oom-killer"
root@gorilla:~# uptime
 07:37:29 up 2 days, 19:34,  1 user,  load average: 1.63, 0.77, 0.29

Now that I suspect this may be a possible bug in kswapd0, I did a search
here for similar issues for kswapd0 and found one (see below) but I am
not sure it is the same problem though the symptoms and workaround are
same.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

At the end of this report (comment #142) says, they have no problem in
4.4.0-45 kernel but Yakkety based 4.8+ kernel has this problem. Assuming
this is the same issue, I can confirm the same as I have never had this
problem before upgrading to Yakkety. I am wondering if the bug made its
way back since this fix. Since I have a workaround, I am going to
continue with it; it is not ideal but seem to hold it. The last note on
the above report says the bug is fixed and any new problem should be
opened as a new bug. Can this report be treated as new bug to address
this problem?

Thanks
 


** Changed in: linux (Ubuntu)
       Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1655356

Title:
  NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kswapd0:50];
  oom-killer; and eventual kernel panic on 16.10 (upgrade from 16.04)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655356/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to