Leann, Another one for the Kernel team to track.
Michael On 02/15/2017 12:10 PM, Launchpad Bug Tracker wrote: > bugproxy (bugproxy) has assigned this bug to you for Ubuntu: > > Issue: > ----------- > Kernel unable to handle paging request and panic occurs when more number of > hugepages is passed as a boot argument to the kernel . > > Environment: > ---------------------- > Power NV : Habanaro Bare metal > OS : Ubuntu 17.04 > Kernel Version : 4.9.0-11-generic > > Steps To reproduce: > ----------------------------------- > > 1 - When the ubuntu Kernel boots try to add the boot argument 'hugepages > = 12000000'. > > The Kernel Panics and displays call traces like as below. > > [ 5.030274] Unable to handle kernel paging request for data at address > 0x00000000 > [ 5.030323] Faulting instruction address: 0xc000000000302848 > [ 5.030366] Oops: Kernel access of bad area, sig: 11 [#1] > [ 5.030399] SMP NR_CPUS=2048 [ 5.030416] NUMA > [ 5.039443] PowerNV > [ 5.039461] Modules linked in: > [ 5.050091] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.9.0-11-generic > #12-Ubuntu > [ 5.053266] Workqueue: events pcpu_balance_workfn > [ 5.080647] task: c000003c8fe9b800 task.stack: c000003ffb118000 > [ 5.090876] NIP: c000000000302848 LR: c0000000002709d4 CTR: > c00000000016cef0 > [ 5.094175] REGS: c000003ffb11b410 TRAP: 0300 Not tainted > (4.9.0-11-generic) > [ 5.103040] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>[ > 5.114466] CR: 22424222 XER: 00000000 > [ 5.124932] CFAR: c000000000008a60 DAR: 0000000000000000 DSISR: 40000000 > SOFTE: 1 > GPR00: c0000000002709d4 c000003ffb11b690 c00000000141a400 c000003fff50e300 > GPR04: 0000000000000000 00000000024001c2 c000003ffb11b780 000000219df50000 > GPR08: 0000003ffb090000 c000000001454fd8 0000000000000000 0000000000000000 > GPR12: 0000000000004400 c000000007b60000 00000000024001c2 00000000024001c2 > GPR16: 00000000024001c2 0000000000000000 0000000000000000 0000000000000002 > GPR20: 000000000000000c 0000000000000000 0000000000000000 00000000024200c0 > GPR24: c0000000016eef48 0000000000000000 c000003fff50fd00 00000000024001c2 > GPR28: 0000000000000000 c000003fff50fd00 c000003fff50e300 c000003ffb11b820 > NIP [c000000000302848] mem_cgroup_soft_limit_reclaim+0xf8/0x4f0 > [ 5.213613] LR [c0000000002709d4] do_try_to_free_pages+0x1b4/0x450 > [ 5.230521] Call Trace: > [ 5.230643] [c000003ffb11b760] [c0000000002709d4] > do_try_to_free_pages+0x1b4/0x450 > [ 5.254184] [c000003ffb11b800] [c000000000270d68] > try_to_free_pages+0xf8/0x270 > [ 5.281896] [c000003ffb11b890] [c000000000259b88] > __alloc_pages_nodemask+0x7a8/0xff0 > [ 5.321407] [c000003ffb11bab0] [c000000000282cd0] > pcpu_populate_chunk+0x110/0x520 > [ 5.336262] [c000003ffb11bb50] [c0000000002841b8] > pcpu_balance_workfn+0x758/0x960 > [ 5.351526] [c000003ffb11bc50] [c0000000000ecdd0] > process_one_work+0x2b0/0x5a0 > [ 5.362561] [c000003ffb11bce0] [c0000000000ed168] worker_thread+0xa8/0x660 > [ 5.374007] [c000003ffb11bd80] [c0000000000f5320] kthread+0x110/0x130 > [ 5.385160] [c000003ffb11be30] [c00000000000c0e8] > ret_from_kernel_thread+0x5c/0x74 > [ 5.389456] Instruction dump: > [ 5.410036] eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 > 3d220004 > [ 5.423598] 3929abd8 794a1f24 7d295214 eac90100 <e9360000> 2fa90000 > 419eff74 3b200000 > [ 5.436503] ---[ end trace 23b650e96be5c549 ]--- > [ 5.439700] > > This is purely a negative scenario where the system does not have enough > memory as the hugepages is given a very large argument. > > Free output in a system: > free -h > total used free shared buff/cache > available > Mem: 251G 2.1G 248G 5.2M 502M > 248G > Swap: 2.0G 159M 1.8G > > The same scenario when tried after the linux is up like as, > > echo 12000000 > /proc/sys/vm/nr_hugepages > > HugePages_Total: 15069 > HugePages_Free: 15069 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 16384 kB > root@ltc-haba2:~# free -h > total used free shared buff/cache > available > Mem: 251G 237G 13G 5.6M 311M > 13G > Swap: 2.0G 159M 1.8G > > In this case the kernel is able to allocate around 237 Gb for hugetlb. > > But while the system is booting it gives us panic so please let know if > this scenario is expected to be handled. > > I identified the root cause of the panic. > When the system is running with low memory during mem cgroup initialisation, > because most of the page have been grabbed to be huge pages, we hit a chicken > and egg issue because when trying to allocate memory for the node's cgroup > descriptor, we try to free some memory and in this path cgroup's services are > called which assume node's cgroup descriptor is allocated. > > I'm working on a patch which fixes this panic, but I think it is > expected that the system fail due to OOM when all the pages are assigned > to huge pages. > > Patch sent upstream, waiting for review : > https://patchwork.kernel.org/patch/9573799/ > > ** Affects: ubuntu > Importance: Undecided > Assignee: Taco Screen team (taco-screen-team) > Status: New > > > ** Tags: architecture-ppc64le bugnameltc-150852 severity-high > targetmilestone-inin1704 -- Michael Hohnbaum OIL Program Manager Power (ppc64el) Development Project Manager Canonical, Ltd. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1665113 Title: [Ubuntu 17.04] Kernel panics when large number of hugepages is passed as an boot argument to kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1665113/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs