** Changed in: ubuntu-power-systems
Importance: Undecided => High
** Changed in: linux (Ubuntu)
Importance: Undecided => High
** Changed in: linux (Ubuntu)
Assignee: Taco Screen team (taco-screen-team) => Ubuntu on IBM Power
Systems Bug Triage (ubuntu-power-triage)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1649513
Title:
[Ubuntu 16.10] NMI watchdog and soft lockup while running htx memory
tests in kernel 4.8.0-17-generic
Status in The Ubuntu-power-systems project:
Incomplete
Status in linux package in Ubuntu:
Incomplete
Bug description:
Issue:
--------------
NMI Watchdog Bug and soft lockup occurs when htx memory test is run in ubuntu
16.10.
Environment:
--------------------------
Arch : ppc64le
Platform : Ubuntu KVM Guest
Host : ubuntu 16.10 [4.8.0-17 -kernel ]
Guest : ubuntu 16.10 [4.8.0-17 - Kernel]
Steps To Reproduce:
-----------------------------------
1 - Install a Ubuntu KVM Guest and install htx package in the guest got from
the link,
http://ausgsa.ibm.com/projects/h/htx/public_html/htxonly/htxubuntu-413.deb
2 - Run the Htx mdt.mem
3 - The system Hits soft lockup Issue as below:
dmesg o/p:
[60287.590335] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 1141s!
[hxemem64:23468]
[60287.590572] Modules linked in: vmx_crypto ip_tables x_tables autofs4
ibmvscsi crc32c_vpmsum
[60287.590585] CPU: 3 PID: 23468 Comm: hxemem64 Tainted: G L
4.8.0-17-generic #19-Ubuntu
[60287.590587] task: c0000012a0971e00 task.stack: c0000012a2d40000
[60287.590589] NIP: c000000000015004 LR: c000000000015004 CTR:
c000000000165e90
[60287.590591] REGS: c0000012a2d439a0 TRAP: 0901 Tainted: G L
(4.8.0-17-generic)
[60287.590592] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48004244
XER: 00000000
[60287.590603] CFAR: c000000000165890 SOFTE: 1
GPR00: c000000000165f9c c0000012a2d43c20 c0000000014e5e00
0000000000000900
GPR04: 0000000000000000 0000000000000008 0000000100e4d61a
0000000000000000
GPR08: 0000000000000000 0000000000000006 0000000100e4d619
c0000012bfee3130
GPR12: 00003fffae6cdc70 00003fffae436900
[60287.590627] NIP [c000000000015004] arch_local_irq_restore+0x74/0x90
[60287.590630] LR [c000000000015004] arch_local_irq_restore+0x74/0x90
[60287.590631] Call Trace:
[60287.590634] [c0000012a2d43c20] [c0000012bfeccd80] 0xc0000012bfeccd80
(unreliable)
[60287.590639] [c0000012a2d43c40] [c000000000165f9c]
run_timer_softirq+0x10c/0x230
[60287.590644] [c0000012a2d43ce0] [c000000000b94adc] __do_softirq+0x18c/0x3fc
[60287.590648] [c0000012a2d43de0] [c0000000000d5828] irq_exit+0xc8/0x100
[60287.590653] [c0000012a2d43e00] [c000000000024810] timer_interrupt+0xa0/0xe0
[60287.590657] [c0000012a2d43e30] [c000000000002814]
decrementer_common+0x114/0x180
[60287.590659] Instruction dump:
[60287.590662] 994d023a 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020
e8010010
[60287.590670] 7c0803a6 4e800020 60420000 4bfed259 <60000000> 4bffffe4
60420000 e92d0020
[63127.581494] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 339s!
[hxemem64:23467]
[63127.629682] Modules linked in: vmx_crypto ip_tables x_tables autofs4
ibmvscsi crc32c_vpmsum
[63127.629699] CPU: 2 PID: 23467 Comm: hxemem64 Tainted: G L
4.8.0-17-generic #19-Ubuntu
[63127.629701] task: c0000012a0965800 task.stack: c0000012a2d58000
[63127.629703] NIP: 0000000010011e60 LR: 000000001000ec6c CTR:
0000000000f33196
[63127.629706] REGS: c0000012a2d5bea0 TRAP: 0901 Tainted: G L
(4.8.0-17-generic)
[63127.629707] MSR: 800000010000d033 <SF,EE,PR,ME,IR,DR,RI,LE,TM[E]> CR:
42004482 XER: 00000000
[63127.629719] CFAR: 0000000010011e68 SOFTE: 1
GPR00: 000000001000e854 00003fffadc2e540 0000000010047f00
000000000000000d
GPR04: 0000000002000000 00003ff5a8000000 5a5a5a5a5a5a5a5a
00003ff5b0667348
GPR08: 0000000000000000 000000001006c8e0 000000001006ca04
fffffffffffff001
GPR12: 00003fffae6cdc70 00003fffadc36900
[63127.629740] NIP [0000000010011e60] 0x10011e60
[63127.629742] LR [000000001000ec6c] 0x1000ec6c
[63127.629743] Call Trace:
== Comment: #3 - Santhosh G <[email protected]> - 2016-09-28 02:17:29 ==
Memory Info :
root@ubuntu:~# cat /proc/meminfo
MemTotal: 78539776 kB
MemFree: 72219392 kB
MemAvailable: 77217088 kB
Buffers: 212544 kB
Cached: 5249088 kB
SwapCached: 0 kB
Active: 1440832 kB
Inactive: 4107264 kB
Active(anon): 93888 kB
Inactive(anon): 8640 kB
Active(file): 1346944 kB
Inactive(file): 4098624 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 3443648 kB
SwapFree: 3443648 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 87296 kB
Mapped: 30400 kB
Shmem: 16128 kB
Slab: 381440 kB
SReclaimable: 295872 kB
SUnreclaim: 85568 kB
KernelStack: 2176 kB
PageTables: 2048 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 42639808 kB
Committed_AS: 224768 kB
VmallocTotal: 8589934592 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 9
HugePages_Free: 9
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
free -h :
total used free shared buff/cache
available
Mem: 74G 545M 68G 15M 5.5G
73G
Swap: 3.3G 0B 3.3G
== Comment: #5 - Santhosh G <[email protected]> - 2016-09-29 02:49:49 ==
(In reply to comment #4)
> Hi Santhosh,
> After how long are you seeing this error ?
> Can you share the output by:
> 1) start the mdt.mem tests.
> 2) While the tests are running what is the output of 'free -h' ?
> 3) Attach /tmp/htxerr
>
> Thank you.
Hi Vaishnavi,
I have run the test for more than 12 hours and not sure exactly when
the lockup occurs.
Before starting the tests,
free -h :
total used free shared buff/cache
available
Mem: 74G 528M 68G 15M 5.5G
73G
Swap: 3.3G 0B 3.3G
After running the tests for more than 10 min :
total used free shared buff/cache available
Mem: 74G 570M 20G 48G 53G
25G
Swap: 3.3G 0B 3.3G
The memory usage gradually Increases.
Not sure exactly at which point the lockup occurs.
And /tmp/htxerror is empty.
== Comment: #7 - Vaishnavi Bhat <[email protected]> - 2016-09-30 04:03:23 ==
Hi Santhosh ,
While running the mdt.mem, we see that the about 60% of memory is used and
free swap is reduced to 0B.
total used free shared buff/cache available
Mem: 74G 570M 20G 48G 53G
25G
Swap: 3.3G 0B 3.3G
Top output
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1860 root 38 18 48.484g 0.046t 0.046t S 318.1 63.5 4865:53 hxemem64
Also the dmesg shows traces of OOM and softlock up with hxemem.
Can you please try increasing vm.min_free_kbytes value and see if it shows
any improvement? I would suggest starting with the double of the current value.
Current value :
$ sysctl -n vm.min_free_kbytes
180224
New value:
$sysctl -w vm.min_free_kbytes=<new value>
Thank you.
== Comment: #10 - Vaishnavi Bhat <[email protected]> - 2016-10-20 04:06:20
==
(In reply to comment #9)
> Hi Vaishnavi,
>
> I am able to reproduce this issue even in 4.8.0-22-generic
>
> o/p:
> sysctl -n vm.min_free_kbytes
> 360448
>
> Please, take a look in to the issue.
>
> Thanks.
Thanks for the confirmation, the issue is being reproduced with
sysctl -n vm.min_free_kbytes
360448
Thank you.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1649513/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp