Leann,

Another one for the Kernel team to track.

                     Michael


On 02/15/2017 12:10 PM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> Issue:
> -----------
> Kernel unable to handle paging request and panic occurs when more number of 
> hugepages is passed as a boot argument to the kernel .
>
> Environment:
> ----------------------
> Power NV : Habanaro Bare metal
> OS : Ubuntu 17.04
> Kernel Version : 4.9.0-11-generic
>
> Steps To reproduce:
> -----------------------------------
>
> 1 - When the ubuntu Kernel boots try to add the boot argument 'hugepages
> = 12000000'.
>
> The Kernel Panics and displays call traces like as below.
>
> [    5.030274] Unable to handle kernel paging request for data at address 
> 0x00000000
> [    5.030323] Faulting instruction address: 0xc000000000302848
> [    5.030366] Oops: Kernel access of bad area, sig: 11 [#1]
> [    5.030399] SMP NR_CPUS=2048 [    5.030416] NUMA 
> [    5.039443] PowerNV
> [    5.039461] Modules linked in:
> [    5.050091] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.9.0-11-generic 
> #12-Ubuntu
> [    5.053266] Workqueue: events pcpu_balance_workfn
> [    5.080647] task: c000003c8fe9b800 task.stack: c000003ffb118000
> [    5.090876] NIP: c000000000302848 LR: c0000000002709d4 CTR: 
> c00000000016cef0
> [    5.094175] REGS: c000003ffb11b410 TRAP: 0300   Not tainted  
> (4.9.0-11-generic)
> [    5.103040] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>[    
> 5.114466]   CR: 22424222  XER: 00000000
> [    5.124932] CFAR: c000000000008a60 DAR: 0000000000000000 DSISR: 40000000 
> SOFTE: 1 
> GPR00: c0000000002709d4 c000003ffb11b690 c00000000141a400 c000003fff50e300 
> GPR04: 0000000000000000 00000000024001c2 c000003ffb11b780 000000219df50000 
> GPR08: 0000003ffb090000 c000000001454fd8 0000000000000000 0000000000000000 
> GPR12: 0000000000004400 c000000007b60000 00000000024001c2 00000000024001c2 
> GPR16: 00000000024001c2 0000000000000000 0000000000000000 0000000000000002 
> GPR20: 000000000000000c 0000000000000000 0000000000000000 00000000024200c0 
> GPR24: c0000000016eef48 0000000000000000 c000003fff50fd00 00000000024001c2 
> GPR28: 0000000000000000 c000003fff50fd00 c000003fff50e300 c000003ffb11b820 
> NIP [c000000000302848] mem_cgroup_soft_limit_reclaim+0xf8/0x4f0
> [    5.213613] LR [c0000000002709d4] do_try_to_free_pages+0x1b4/0x450
> [    5.230521] Call Trace:
> [    5.230643] [c000003ffb11b760] [c0000000002709d4] 
> do_try_to_free_pages+0x1b4/0x450
> [    5.254184] [c000003ffb11b800] [c000000000270d68] 
> try_to_free_pages+0xf8/0x270
> [    5.281896] [c000003ffb11b890] [c000000000259b88] 
> __alloc_pages_nodemask+0x7a8/0xff0
> [    5.321407] [c000003ffb11bab0] [c000000000282cd0] 
> pcpu_populate_chunk+0x110/0x520
> [    5.336262] [c000003ffb11bb50] [c0000000002841b8] 
> pcpu_balance_workfn+0x758/0x960
> [    5.351526] [c000003ffb11bc50] [c0000000000ecdd0] 
> process_one_work+0x2b0/0x5a0
> [    5.362561] [c000003ffb11bce0] [c0000000000ed168] worker_thread+0xa8/0x660
> [    5.374007] [c000003ffb11bd80] [c0000000000f5320] kthread+0x110/0x130
> [    5.385160] [c000003ffb11be30] [c00000000000c0e8] 
> ret_from_kernel_thread+0x5c/0x74
> [    5.389456] Instruction dump:
> [    5.410036] eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 
> 3d220004 
> [    5.423598] 3929abd8 794a1f24 7d295214 eac90100 <e9360000> 2fa90000 
> 419eff74 3b200000 
> [    5.436503] ---[ end trace 23b650e96be5c549 ]---
> [    5.439700] 
>
> This is purely a negative scenario where the system does not have enough
> memory as the hugepages is given a very large argument.
>
> Free output in a system:
> free -h
>               total        used        free      shared  buff/cache   
> available
> Mem:           251G        2.1G        248G        5.2M        502M        
> 248G
> Swap:          2.0G        159M        1.8G
>
> The same scenario when tried after the linux is up like as,
>
> echo 12000000 > /proc/sys/vm/nr_hugepages
>
> HugePages_Total:   15069
> HugePages_Free:    15069
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:      16384 kB
> root@ltc-haba2:~# free -h
>               total        used        free      shared  buff/cache   
> available
> Mem:           251G        237G         13G        5.6M        311M         
> 13G
> Swap:          2.0G        159M        1.8G
>
> In this case the kernel is able to allocate around 237 Gb for hugetlb.
>
> But while the system is booting it gives us panic so please let know if
> this scenario  is expected  to be handled.
>
> I identified the root cause of the panic.
> When the system is running with low memory during mem cgroup initialisation, 
> because most of the page have been grabbed to be huge pages, we hit a chicken 
> and egg issue because when trying to allocate memory for the node's cgroup 
> descriptor, we try to free some memory and in this path cgroup's services are 
> called which assume node's cgroup descriptor is allocated.
>
> I'm working on a patch which fixes this panic, but I think it is
> expected that the system fail due to OOM when all the pages are assigned
> to huge pages.
>
> Patch sent upstream, waiting for review : 
> https://patchwork.kernel.org/patch/9573799/
>
> ** Affects: ubuntu
>      Importance: Undecided
>      Assignee: Taco Screen team (taco-screen-team)
>          Status: New
>
>
> ** Tags: architecture-ppc64le bugnameltc-150852 severity-high 
> targetmilestone-inin1704

-- 
Michael Hohnbaum
OIL Program Manager
Power (ppc64el) Development Project Manager
Canonical, Ltd.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1665113

Title:
  [Ubuntu 17.04] Kernel panics when large number of hugepages is passed
  as an boot argument to kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1665113/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to