[Bug 1632458] [NEW] [Ubuntu 16.10] - System crashes and gives out call traces when libhugetlbfs test suite is run.

bugproxy Tue, 11 Oct 2016 13:31:08 -0700

Public bug reported:

== Comment: #0 - Santhosh G <santh...@in.ibm.com> - 2016-09-27 01:55:00 ==
Issue:
Kernel unable to handle page request when heapshrink test case is run from 
libhugetlbfs suite.


Environment:
arch - ppc64le
ubuntu kvm guest

Host related Info:
Kernel:
-----------------
uname -a
Linux ltc-haba1 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux

Memory:
--------------------
oot@ltc-haba1:~# free -h
              total        used        free      shared  buff/cache   available
Mem:           255G         65G        187G         22M        1.9G        188G
Swap:          225G          0B        225G

Hugepages configured:
----------------------------------------
root@ltc-haba1:~# cat /proc/meminfo | grep -i Huge
AnonHugePages:     81920 kB
ShmemHugePages:        0 kB
HugePages_Total:    4096
HugePages_Free:     3584
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:      16384 kB


Guest Related Info:
--------------------------------------
-------------------------------------
Kernel:
-------------------------
root@ubuntu:~/libhugetlbfs# uname -a
Linux ubuntu 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux

Memory:
---------------------------------
root@ubuntu:~/libhugetlbfs# free -h
              total        used        free      shared  buff/cache   available
Mem:           8.0G        133M        7.7G         15M        132M        7.5G
Swap:          3.3G          0B        3.3G

Hugepages configured:
-------------------------------------------
root@ubuntu:~/libhugetlbfs# cat /proc/meminfo | grep -i Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     256
HugePages_Free:      256
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:      16384 kB


Steps to reproduce:
1- Install a ubuntu kvm guest with hugepages memory Backing.
2 - git clone the latest libhugetlbfs from 
https://github.com/libhugetlbfs/libhugetlbfs.git
3 - configure huge[pages in guest and run make check.

xmon is configured in the system .
The system gets call traces and enters xmon console:

HUGETLB_VERBOSE=1 HUGETLB_MORECORE=yes heap-overflow (16M: 64): [  281.735713] 
Unable to handle kernel paging request for data at address 0x4200000000328e38
[  281.735804] Faulting instruction address: 0xc00000000027b410
cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8c3730]
    pc: c00000000027b410: shrink_active_list+0x300/0x4d0
    lr: c00000000027b3f4: shrink_active_list+0x2e4/0x4d0
    sp: c0000001fa8c39b0
   msr: 800000010280b033
   dar: 4200000000328e38
 dsisr: 42000000
  current = 0xc0000001fa8adc00
  paca    = 0xc00000000fb80900   softe: 0        irq_happened: 0x01
    pid   = 50, comm = kswapd0
Linux version 4.8.0-17-generic (buildd@bos01-ppc64el-025) (gcc version 6.2.0 
20160914 (Ubuntu 6.2.0-3ubuntu15) ) #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
(Ubuntu 4.8.0-17.19-generic 4.8.0-rc7)
enter ? for help
[c0000001fa8c3aa0] c00000000027bbdc shrink_node_memcg+0x5fc/0x800
[c0000001fa8c3bc0] c00000000027bf0c shrink_node+0x12c/0x3f0
[c0000001fa8c3c80] c00000000027d500 kswapd+0x460/0x990
[c0000001fa8c3d80] c0000000000fd120 kthread+0x110/0x130
[c0000001fa8c3e30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c

xmon logs:

1:mon> e
cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8e7730]
    pc: c00000000027b410: shrink_active_list+0x300/0x4d0
    lr: c00000000027b3f4: shrink_active_list+0x2e4/0x4d0
    sp: c0000001fa8e79b0
   msr: 800000010280b033
   dar: 42000000000c58d0
 dsisr: 42000000
  current = 0xc0000001fa8a0000
  paca    = 0xc00000000fb80900   softe: 0        irq_happened: 0x01
    pid   = 50, comm = kswapd0
Linux version 4.8.0-17-generic (buildd@bos01-ppc64el-025) (gcc version 6.2.0 
20160914 (Ubuntu 6.2.0-3ubuntu15) ) #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
(Ubuntu 4.8.0-17.19-generic 4.8.0-rc7)

1:mon> r
R00 = c00000000027b3f4   R16 = c0000001fffcfe00
R01 = c0000001fa8e79b0   R17 = 000000000000010a
R02 = c0000000014e5e00   R18 = 42000000000cbdd0
R03 = 0000000000000001   R19 = c0000001fffc6300
R04 = 0000000000000005   R20 = c0000001fa8e79e0
R05 = 0000000000000000   R21 = c0000001fe144800
R06 = f0000000003bc9a0   R22 = 0000000000000001
R07 = 00000001fee30000   R23 = 0000000000000005
R08 = 000000000000002a   R24 = 000000000000207d
R09 = 0000000000000000   R25 = 0000000000000100
R10 = c000000001034e86   R26 = 0000000000000200
R11 = 0000000000000000   R27 = c0000001fa8e79d0
R12 = 0000000000002200   R28 = c0000001fa8e7ca0
R13 = c00000000fb80900   R29 = 0000000000000040
R14 = f000000000380000   R30 = c0000001fe144800
R15 = f000000000380020   R31 = c0000001fa8e79f0
pc  = c00000000027b410 shrink_active_list+0x300/0x4d0
cfar= c0000000000b47a4 kvmppc_call_hv_entry+0x130/0x134
lr  = c00000000027b3f4 shrink_active_list+0x2e4/0x4d0
msr = 800000010280b033   cr  = 24022222
ctr = c0000000002ba900   xer = 0000000020000000   trap =  300
dar = 42000000000c58d0   dsisr = 42000000

1:mon> t
[c0000001fa8e7aa0] c00000000027bc70 shrink_node_memcg+0x690/0x800
[c0000001fa8e7bc0] c00000000027bf0c shrink_node+0x12c/0x3f0
[c0000001fa8e7c80] c00000000027d500 kswapd+0x460/0x990
[c0000001fa8e7d80] c0000000000fd120 kthread+0x110/0x130
[c0000001fa8e7e30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c

== Comment: #2 - Santhosh G <santh...@in.ibm.com> - 2016-09-27 04:28:02 ==
Something similar to this issue is observed when mm tests in ltp is run.

Call Traces Output:
oom01       0  TINFO [ 2577.866629] Unable to handle kernel paging request for 
data at address 0x42000000004311d0
[ 2577.866759] Faulting instruction address: 0xc00000000027b410
[ 2577.866846] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2577.866911] SMP NR_CPUS=2048 NUMA pSeries
[ 2577.866980] Modules linked in: vmx_crypto ip_tables x_tables autofs4 
ibmvscsi crc32c_vpmsum
[ 2577.867152] CPU: 119 PID: 116856 Comm: oom01 Not tainted 4.8.0-17-generic 
#19-Ubuntu
[ 2577.867252] task: c000000db5d56000 task.stack: c00000031a898000
[ 2577.867334] NIP: c00000000027b410 LR: c00000000027b3f4 CTR: 0000000000000006
[ 2577.867433] REGS: c00000031a89b3e0 TRAP: 0300   Not tainted  
(4.8.0-17-generic)
[ 2577.867531] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  
CR: 28422222  XER: 20000000
[ 2577.867864] CFAR: c0000000000b477c DAR: 42000000004311d0 DSISR: 42000000 
SOFTE: 0 
GPR00: c00000000027b3f4 c00000031a89b660 c0000000014e5e00 0000000000000001 
GPR04: 0000000000000005 0000000000000000 f000000000252960 0000000de7db0000 
GPR08: 000000000000007d 0000000000000000 c000000001034e86 0000000000000000 
GPR12: 0000000000002200 c00000000fbc2f00 f000000001ec8000 f000000001ec8020 
GPR16: c000000defb93e00 0000000000000111 42000000004376d0 c000000defb8a300 
GPR20: c00000031a89b690 c000000dee0a4800 0000000000000001 0000000000000005 
GPR24: 0000000000023657 0000000000000100 0000000000000200 c00000031a89b680 
GPR28: c00000031a89ba00 0000000000000040 c000000dee0a4800 c00000031a89b6a0 
[ 2577.869185] NIP [c00000000027b410] shrink_active_list+0x300/0x4d0
[ 2577.869268] LR [c00000000027b3f4] shrink_active_list+0x2e4/0x4d0
[ 2577.869349] Call Trace:
[ 2577.869385] [c00000031a89b660] [c00000000027b3f4] 
shrink_active_list+0x2e4/0x4d0 (unreliable)
[ 2577.869518] [c00000031a89b750] [c00000000027bc70] 
shrink_node_memcg+0x690/0x800
[ 2577.869633] [c00000031a89b870] [c00000000027bf0c] shrink_node+0x12c/0x3f0
[ 2577.869733] [c00000031a89b930] [c00000000027c308] 
do_try_to_free_pages+0x138/0x480
[ 2577.869849] [c00000031a89b9e0] [c00000000027c74c] 
try_to_free_pages+0xfc/0x270
[ 2577.869963] [c00000031a89ba70] [c000000000264afc] 
__alloc_pages_nodemask+0x72c/0xee0
[ 2577.870081] [c00000031a89bc30] [c0000000002e1758] alloc_pages_vma+0x108/0x360
[ 2577.870181] [c00000031a89bcc0] [c0000000002ac5d4] 
handle_mm_fault+0x1024/0x14e0
[ 2577.870299] [c00000031a89bd80] [c000000000b90d50] do_page_fault+0x350/0x7d0
[ 2577.870435] [c00000031a89be30] [c000000000008948] handle_page_fault+0x10/0x30
[ 2577.870532] Instruction dump:
[ 2577.870578] 4bffbc19 7cb100d0 7ee4bb78 7e639b78 4800dbf9 60000000 892d023c 
2f890000 
[ 2577.870716] 409e01a4 7c2004ac 39200000 38600001 <91329b00> 4bd99b85 60000000 
7fe3fb78 
[ 2577.870845] ---[ end trace b2b062e289b7708f ]---
[ 2577.873701]

== Comment: #3 - Chandan Kumar <ckuma...@in.ibm.com> - 2016-09-27
05:18:41 ==


== Comment: #13 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-04 
11:51:59 ==


== Comment: #14 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-05 
04:18:52 ==


== Comment: #15 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-05 
05:12:41 ==


== Comment: #17 - Luciano Chavez <cha...@us.ibm.com> - 2016-10-05 15:40:06 ==


== Comment: #22 - Richard M. Scheller <rsche...@us.ibm.com> - 2016-10-06 
22:21:26 ==
(In reply to comment #21)
> Patched ubuntu kernel packages based on 4.8.0-19.21 are available here:
> http://www.lab.toulouse-stg.fr.ibm.com/~laurent/BZ146511/
> 
> laurent@test1:~$ uname -v
> #21+bz146511 SMP Thu Oct 6 16:37:38 CEST 2016
> 
> Please give a try.

I have run with this patched kernel on four guests on my Ubuntu 16.10
KVM host.  Three of my guests are NOT backed by huge pages.  The fourth
guest is backed by huge pages.  All four of these guests have PCI
passthrough adapters.

All four of these guests crashed and rebooted within a few hours with
out-of-memory errors, both with the standard Ubuntu 4.8.0-19 kernel and
with this patched kernel.

There are five other guests on the same host system which do not have
PCI passthrough adapters.  None of these guests are reproducing the out-
of-memory errors, despite running the same test suites.

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bugnameltc-146776 severity-critical 
targetmilestone-inin---

** Tags added: architecture-ppc64le bugnameltc-146776 severity-critical
targetmilestone-inin---

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1632458

Title:
  [Ubuntu 16.10] - System crashes and gives out call traces when
  libhugetlbfs test suite is run.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1632458/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1632458] [NEW] [Ubuntu 16.10] - System crashes and gives out call traces when libhugetlbfs test suite is run.

Reply via email to