Public bug reported: == Comment: #0 - Santhosh G <santh...@in.ibm.com> - 2016-09-27 01:55:00 == Issue: Kernel unable to handle page request when heapshrink test case is run from libhugetlbfs suite.
Environment: arch - ppc64le ubuntu kvm guest Host related Info: Kernel: ----------------- uname -a Linux ltc-haba1 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux Memory: -------------------- oot@ltc-haba1:~# free -h total used free shared buff/cache available Mem: 255G 65G 187G 22M 1.9G 188G Swap: 225G 0B 225G Hugepages configured: ---------------------------------------- root@ltc-haba1:~# cat /proc/meminfo | grep -i Huge AnonHugePages: 81920 kB ShmemHugePages: 0 kB HugePages_Total: 4096 HugePages_Free: 3584 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 16384 kB Guest Related Info: -------------------------------------- ------------------------------------- Kernel: ------------------------- root@ubuntu:~/libhugetlbfs# uname -a Linux ubuntu 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux Memory: --------------------------------- root@ubuntu:~/libhugetlbfs# free -h total used free shared buff/cache available Mem: 8.0G 133M 7.7G 15M 132M 7.5G Swap: 3.3G 0B 3.3G Hugepages configured: ------------------------------------------- root@ubuntu:~/libhugetlbfs# cat /proc/meminfo | grep -i Huge AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 256 HugePages_Free: 256 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 16384 kB Steps to reproduce: 1- Install a ubuntu kvm guest with hugepages memory Backing. 2 - git clone the latest libhugetlbfs from https://github.com/libhugetlbfs/libhugetlbfs.git 3 - configure huge[pages in guest and run make check. xmon is configured in the system . The system gets call traces and enters xmon console: HUGETLB_VERBOSE=1 HUGETLB_MORECORE=yes heap-overflow (16M: 64): [ 281.735713] Unable to handle kernel paging request for data at address 0x4200000000328e38 [ 281.735804] Faulting instruction address: 0xc00000000027b410 cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8c3730] pc: c00000000027b410: shrink_active_list+0x300/0x4d0 lr: c00000000027b3f4: shrink_active_list+0x2e4/0x4d0 sp: c0000001fa8c39b0 msr: 800000010280b033 dar: 4200000000328e38 dsisr: 42000000 current = 0xc0000001fa8adc00 paca = 0xc00000000fb80900 softe: 0 irq_happened: 0x01 pid = 50, comm = kswapd0 Linux version 4.8.0-17-generic (buildd@bos01-ppc64el-025) (gcc version 6.2.0 20160914 (Ubuntu 6.2.0-3ubuntu15) ) #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 (Ubuntu 4.8.0-17.19-generic 4.8.0-rc7) enter ? for help [c0000001fa8c3aa0] c00000000027bbdc shrink_node_memcg+0x5fc/0x800 [c0000001fa8c3bc0] c00000000027bf0c shrink_node+0x12c/0x3f0 [c0000001fa8c3c80] c00000000027d500 kswapd+0x460/0x990 [c0000001fa8c3d80] c0000000000fd120 kthread+0x110/0x130 [c0000001fa8c3e30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c xmon logs: 1:mon> e cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8e7730] pc: c00000000027b410: shrink_active_list+0x300/0x4d0 lr: c00000000027b3f4: shrink_active_list+0x2e4/0x4d0 sp: c0000001fa8e79b0 msr: 800000010280b033 dar: 42000000000c58d0 dsisr: 42000000 current = 0xc0000001fa8a0000 paca = 0xc00000000fb80900 softe: 0 irq_happened: 0x01 pid = 50, comm = kswapd0 Linux version 4.8.0-17-generic (buildd@bos01-ppc64el-025) (gcc version 6.2.0 20160914 (Ubuntu 6.2.0-3ubuntu15) ) #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 (Ubuntu 4.8.0-17.19-generic 4.8.0-rc7) 1:mon> r R00 = c00000000027b3f4 R16 = c0000001fffcfe00 R01 = c0000001fa8e79b0 R17 = 000000000000010a R02 = c0000000014e5e00 R18 = 42000000000cbdd0 R03 = 0000000000000001 R19 = c0000001fffc6300 R04 = 0000000000000005 R20 = c0000001fa8e79e0 R05 = 0000000000000000 R21 = c0000001fe144800 R06 = f0000000003bc9a0 R22 = 0000000000000001 R07 = 00000001fee30000 R23 = 0000000000000005 R08 = 000000000000002a R24 = 000000000000207d R09 = 0000000000000000 R25 = 0000000000000100 R10 = c000000001034e86 R26 = 0000000000000200 R11 = 0000000000000000 R27 = c0000001fa8e79d0 R12 = 0000000000002200 R28 = c0000001fa8e7ca0 R13 = c00000000fb80900 R29 = 0000000000000040 R14 = f000000000380000 R30 = c0000001fe144800 R15 = f000000000380020 R31 = c0000001fa8e79f0 pc = c00000000027b410 shrink_active_list+0x300/0x4d0 cfar= c0000000000b47a4 kvmppc_call_hv_entry+0x130/0x134 lr = c00000000027b3f4 shrink_active_list+0x2e4/0x4d0 msr = 800000010280b033 cr = 24022222 ctr = c0000000002ba900 xer = 0000000020000000 trap = 300 dar = 42000000000c58d0 dsisr = 42000000 1:mon> t [c0000001fa8e7aa0] c00000000027bc70 shrink_node_memcg+0x690/0x800 [c0000001fa8e7bc0] c00000000027bf0c shrink_node+0x12c/0x3f0 [c0000001fa8e7c80] c00000000027d500 kswapd+0x460/0x990 [c0000001fa8e7d80] c0000000000fd120 kthread+0x110/0x130 [c0000001fa8e7e30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c == Comment: #2 - Santhosh G <santh...@in.ibm.com> - 2016-09-27 04:28:02 == Something similar to this issue is observed when mm tests in ltp is run. Call Traces Output: oom01 0 TINFO [ 2577.866629] Unable to handle kernel paging request for data at address 0x42000000004311d0 [ 2577.866759] Faulting instruction address: 0xc00000000027b410 [ 2577.866846] Oops: Kernel access of bad area, sig: 11 [#1] [ 2577.866911] SMP NR_CPUS=2048 NUMA pSeries [ 2577.866980] Modules linked in: vmx_crypto ip_tables x_tables autofs4 ibmvscsi crc32c_vpmsum [ 2577.867152] CPU: 119 PID: 116856 Comm: oom01 Not tainted 4.8.0-17-generic #19-Ubuntu [ 2577.867252] task: c000000db5d56000 task.stack: c00000031a898000 [ 2577.867334] NIP: c00000000027b410 LR: c00000000027b3f4 CTR: 0000000000000006 [ 2577.867433] REGS: c00000031a89b3e0 TRAP: 0300 Not tainted (4.8.0-17-generic) [ 2577.867531] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28422222 XER: 20000000 [ 2577.867864] CFAR: c0000000000b477c DAR: 42000000004311d0 DSISR: 42000000 SOFTE: 0 GPR00: c00000000027b3f4 c00000031a89b660 c0000000014e5e00 0000000000000001 GPR04: 0000000000000005 0000000000000000 f000000000252960 0000000de7db0000 GPR08: 000000000000007d 0000000000000000 c000000001034e86 0000000000000000 GPR12: 0000000000002200 c00000000fbc2f00 f000000001ec8000 f000000001ec8020 GPR16: c000000defb93e00 0000000000000111 42000000004376d0 c000000defb8a300 GPR20: c00000031a89b690 c000000dee0a4800 0000000000000001 0000000000000005 GPR24: 0000000000023657 0000000000000100 0000000000000200 c00000031a89b680 GPR28: c00000031a89ba00 0000000000000040 c000000dee0a4800 c00000031a89b6a0 [ 2577.869185] NIP [c00000000027b410] shrink_active_list+0x300/0x4d0 [ 2577.869268] LR [c00000000027b3f4] shrink_active_list+0x2e4/0x4d0 [ 2577.869349] Call Trace: [ 2577.869385] [c00000031a89b660] [c00000000027b3f4] shrink_active_list+0x2e4/0x4d0 (unreliable) [ 2577.869518] [c00000031a89b750] [c00000000027bc70] shrink_node_memcg+0x690/0x800 [ 2577.869633] [c00000031a89b870] [c00000000027bf0c] shrink_node+0x12c/0x3f0 [ 2577.869733] [c00000031a89b930] [c00000000027c308] do_try_to_free_pages+0x138/0x480 [ 2577.869849] [c00000031a89b9e0] [c00000000027c74c] try_to_free_pages+0xfc/0x270 [ 2577.869963] [c00000031a89ba70] [c000000000264afc] __alloc_pages_nodemask+0x72c/0xee0 [ 2577.870081] [c00000031a89bc30] [c0000000002e1758] alloc_pages_vma+0x108/0x360 [ 2577.870181] [c00000031a89bcc0] [c0000000002ac5d4] handle_mm_fault+0x1024/0x14e0 [ 2577.870299] [c00000031a89bd80] [c000000000b90d50] do_page_fault+0x350/0x7d0 [ 2577.870435] [c00000031a89be30] [c000000000008948] handle_page_fault+0x10/0x30 [ 2577.870532] Instruction dump: [ 2577.870578] 4bffbc19 7cb100d0 7ee4bb78 7e639b78 4800dbf9 60000000 892d023c 2f890000 [ 2577.870716] 409e01a4 7c2004ac 39200000 38600001 <91329b00> 4bd99b85 60000000 7fe3fb78 [ 2577.870845] ---[ end trace b2b062e289b7708f ]--- [ 2577.873701] == Comment: #3 - Chandan Kumar <ckuma...@in.ibm.com> - 2016-09-27 05:18:41 == == Comment: #13 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-04 11:51:59 == == Comment: #14 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-05 04:18:52 == == Comment: #15 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-05 05:12:41 == == Comment: #17 - Luciano Chavez <cha...@us.ibm.com> - 2016-10-05 15:40:06 == == Comment: #22 - Richard M. Scheller <rsche...@us.ibm.com> - 2016-10-06 22:21:26 == (In reply to comment #21) > Patched ubuntu kernel packages based on 4.8.0-19.21 are available here: > http://www.lab.toulouse-stg.fr.ibm.com/~laurent/BZ146511/ > > laurent@test1:~$ uname -v > #21+bz146511 SMP Thu Oct 6 16:37:38 CEST 2016 > > Please give a try. I have run with this patched kernel on four guests on my Ubuntu 16.10 KVM host. Three of my guests are NOT backed by huge pages. The fourth guest is backed by huge pages. All four of these guests have PCI passthrough adapters. All four of these guests crashed and rebooted within a few hours with out-of-memory errors, both with the standard Ubuntu 4.8.0-19 kernel and with this patched kernel. There are five other guests on the same host system which do not have PCI passthrough adapters. None of these guests are reproducing the out- of-memory errors, despite running the same test suites. ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Taco Screen team (taco-screen-team) Status: New ** Tags: architecture-ppc64le bugnameltc-146776 severity-critical targetmilestone-inin--- ** Tags added: architecture-ppc64le bugnameltc-146776 severity-critical targetmilestone-inin--- -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1632458 Title: [Ubuntu 16.10] - System crashes and gives out call traces when libhugetlbfs test suite is run. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1632458/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs