** Description changed: [Impact] - * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page - allocation failures - * This happens due to page reclaim not waking up flusher threads - * OOM can be triggered even if the system has enough available memory + * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page + allocation failures + * This happens due to page reclaim not waking up flusher threads + * OOM can be triggered even if the system has enough available memory [Test Plan] - * For the bug to properly trigger, we should uninstall apport and use the - attached alloc_and_crash.c reproducer - * alloc_and_crash will mmap a huge range of memory, memset it and forcibly SEGFAULT - * The attached bash script will membind alloc_and_crash to NUMA node 0, so we - can see the allocation failures in dmesg - $ sudo apt remove --purge apport - $ sudo dmesg -c; ./repro.bash; sleep 2; sudo dmesg + * For the bug to properly trigger, we should uninstall apport and use the + attached alloc_and_crash.c reproducer + * alloc_and_crash will mmap a huge range of memory, memset it and forcibly SEGFAULT + * The attached bash script will membind alloc_and_crash to NUMA node 0, so we + can see the allocation failures in dmesg + $ sudo apt remove --purge apport + $ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg + + [ 124.974328] nvme 0014:01:00.0: Using 48-bit DMA addresses + [ 131.659813] alloc_and_crash: page allocation failure: order:0, mode:0x141cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_WRITE), nodemask=0,cpuset=/,mems_allowed=0-1 + [ 131.659827] CPU: 114 PID: 2758 Comm: alloc_and_crash Not tainted 6.8.0-1021-nvidia-64k #23-Ubuntu + [ 131.659830] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023 + [ 131.659832] Call trace: + [ 131.659834] dump_backtrace+0xa4/0x150 + [ 131.659841] show_stack+0x24/0x50 + [ 131.659843] dump_stack_lvl+0xc8/0x138 + [ 131.659847] dump_stack+0x1c/0x38 + [ 131.659849] warn_alloc+0x16c/0x1f0 + [ 131.659853] __alloc_pages_slowpath.constprop.0+0x8e4/0x9f0 + [ 131.659855] __alloc_pages+0x2f0/0x3a8 + [ 131.659857] alloc_pages_mpol+0x94/0x290 + [ 131.659860] alloc_pages+0x6c/0x118 + [ 131.659861] folio_alloc+0x24/0x98 + [ 131.659862] filemap_alloc_folio+0x168/0x188 + [ 131.659865] __filemap_get_folio+0x1bc/0x3f8 + [ 131.659867] ext4_da_write_begin+0x144/0x300 + [ 131.659870] generic_perform_write+0xc4/0x228 + [ 131.659872] ext4_buffered_write_iter+0x78/0x180 + [ 131.659874] ext4_file_write_iter+0x44/0xf0 + [ 131.659876] __kernel_write_iter+0x10c/0x2c0 + [ 131.659878] dump_user_range+0xe0/0x240 + [ 131.659881] elf_core_dump+0x4cc/0x538 + [ 131.659884] do_coredump+0x574/0x988 + [ 131.659885] get_signal+0x7dc/0x8f0 + [ 131.659887] do_signal+0x138/0x1f8 + [ 131.659888] do_notify_resume+0x114/0x298 + [ 131.659890] el0_da+0xdc/0x178 + [ 131.659892] el0t_64_sync_handler+0xdc/0x158 + [ 131.659894] el0t_64_sync+0x1b0/0x1b8 + [ 131.659896] Mem-Info: + [ 131.659901] active_anon:12408 inactive_anon:3470004 isolated_anon:0 + active_file:2437 inactive_file:264544 isolated_file:0 + unevictable:609 dirty:260589 writeback:0 + slab_reclaimable:9016 slab_unreclaimable:34145 + mapped:3473656 shmem:3474196 pagetables:610 + sec_pagetables:0 bounce:0 + kernel_misc_reclaimable:0 + free:4015222 free_pcp:95 free_cma:48 + [ 131.659904] Node 0 active_anon:660480kB inactive_anon:222080256kB active_file:896kB inactive_file:16669696kB unevictable:9024kB isolated(anon):0kB isolated(file):0kB mapped:222261312kB dirty:16669504kB writeback:0kB shmem:222319552kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0k + B kernel_stack:47872kB shadow_call_stack:62144kB pagetables:28800kB sec_pagetables:0kB all_unreclaimable? yes + [ 131.659908] Node 0 DMA free:1041984kB boost:0kB min:69888kB low:87360kB high:104832kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:466176kB unevictable:0kB writepending:466176kB present:2097152kB managed:2029632kB mlocked:0kB bounce:0kB free_pcp:0kB loca + l_pcp:0kB free_cma:3072kB + [ 131.659911] lowmem_reserve[]: 0 0 15189 15189 15189 + [ 131.659915] Node 0 Normal free:8566336kB boost:0kB min:8575808kB low:10719744kB high:12863680kB reserved_highatomic:0KB active_anon:660480kB inactive_anon:222080256kB active_file:896kB inactive_file:16203520kB unevictable:9024kB writepending:16203328kB present:249244544kB managed:248932800kB mloc + ked:0kB bounce:0kB free_pcp:6080kB local_pcp:6080kB free_cma:0kB + [ 131.659918] lowmem_reserve[]: 0 0 0 0 0 + [ 131.659922] Node 0 DMA: 1*64kB (M) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UC) 3*2048kB (UMC) 2*4096kB (UM) 1*8192kB (U) 2*16384kB (UM) 2*32768kB (UM) 2*65536kB (UM) 2*131072kB (UM) 2*262144kB (UM) 0*524288kB = 1041984kB + [ 131.659936] Node 0 Normal: 439*64kB (UE) 333*128kB (UME) 192*256kB (UME) 91*512kB (UME) 31*1024kB (UE) 12*2048kB (UME) 5*4096kB (UE) 2*8192kB (U) 3*16384kB (UME) 2*32768kB (UE) 1*65536kB (U) 2*131072kB (UE) 2*262144kB (UE) 14*524288kB (M) = 8566336kB + [ 131.659952] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB + [ 131.659955] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB + [ 131.659956] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB + [ 131.659957] 3748043 total pagecache pages + [ 131.659959] 6731 pages in swap cache + [ 131.659960] Free swap = 0kB + [ 131.659961] Total swap = 8388544kB + [ 131.659961] 7858556 pages RAM + [ 131.659962] 0 pages HighMem/MovableOnly + [ 131.659963] 12344 pages reserved + [ 131.659964] 8192 pages cma reserved + [ 131.659965] 0 pages hwpoisoned [Fix] - * The upstream patch wakes up flusher threads if there are too many dirty - entries in the coldest LRU generation - * This happens when trying to shrink lruvecs, so reclaim only gets woken up - during high memory pressure - * Fix was introduced by commit: - 1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM + * The upstream patch wakes up flusher threads if there are too many dirty + entries in the coldest LRU generation + * This happens when trying to shrink lruvecs, so reclaim only gets woken up + during high memory pressure + * Fix was introduced by commit: + 1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM [Regression Potential] - * This commit fixes the memory reclaim path, so regressions would likely show - up during increased system memory pressure - * According to the upstream patch, increased SSD/disk wearing is possible due - to waking up flusher threads, although these have not been noted in testing + * This commit fixes the memory reclaim path, so regressions would likely show + up during increased system memory pressure + * According to the upstream patch, increased SSD/disk wearing is possible due + to waking up flusher threads, although these have not been noted in testing
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2097214 Title: MGLRU: page allocation failure on NUMA-enabled systems Status in linux package in Ubuntu: Confirmed Status in linux source package in Noble: Confirmed Status in linux source package in Oracular: Confirmed Status in linux source package in Plucky: Confirmed Bug description: [Impact] * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page allocation failures * This happens due to page reclaim not waking up flusher threads * OOM can be triggered even if the system has enough available memory [Test Plan] * For the bug to properly trigger, we should uninstall apport and use the attached alloc_and_crash.c reproducer * alloc_and_crash will mmap a huge range of memory, memset it and forcibly SEGFAULT * The attached bash script will membind alloc_and_crash to NUMA node 0, so we can see the allocation failures in dmesg $ sudo apt remove --purge apport $ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg [ 124.974328] nvme 0014:01:00.0: Using 48-bit DMA addresses [ 131.659813] alloc_and_crash: page allocation failure: order:0, mode:0x141cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_WRITE), nodemask=0,cpuset=/,mems_allowed=0-1 [ 131.659827] CPU: 114 PID: 2758 Comm: alloc_and_crash Not tainted 6.8.0-1021-nvidia-64k #23-Ubuntu [ 131.659830] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023 [ 131.659832] Call trace: [ 131.659834] dump_backtrace+0xa4/0x150 [ 131.659841] show_stack+0x24/0x50 [ 131.659843] dump_stack_lvl+0xc8/0x138 [ 131.659847] dump_stack+0x1c/0x38 [ 131.659849] warn_alloc+0x16c/0x1f0 [ 131.659853] __alloc_pages_slowpath.constprop.0+0x8e4/0x9f0 [ 131.659855] __alloc_pages+0x2f0/0x3a8 [ 131.659857] alloc_pages_mpol+0x94/0x290 [ 131.659860] alloc_pages+0x6c/0x118 [ 131.659861] folio_alloc+0x24/0x98 [ 131.659862] filemap_alloc_folio+0x168/0x188 [ 131.659865] __filemap_get_folio+0x1bc/0x3f8 [ 131.659867] ext4_da_write_begin+0x144/0x300 [ 131.659870] generic_perform_write+0xc4/0x228 [ 131.659872] ext4_buffered_write_iter+0x78/0x180 [ 131.659874] ext4_file_write_iter+0x44/0xf0 [ 131.659876] __kernel_write_iter+0x10c/0x2c0 [ 131.659878] dump_user_range+0xe0/0x240 [ 131.659881] elf_core_dump+0x4cc/0x538 [ 131.659884] do_coredump+0x574/0x988 [ 131.659885] get_signal+0x7dc/0x8f0 [ 131.659887] do_signal+0x138/0x1f8 [ 131.659888] do_notify_resume+0x114/0x298 [ 131.659890] el0_da+0xdc/0x178 [ 131.659892] el0t_64_sync_handler+0xdc/0x158 [ 131.659894] el0t_64_sync+0x1b0/0x1b8 [ 131.659896] Mem-Info: [ 131.659901] active_anon:12408 inactive_anon:3470004 isolated_anon:0 active_file:2437 inactive_file:264544 isolated_file:0 unevictable:609 dirty:260589 writeback:0 slab_reclaimable:9016 slab_unreclaimable:34145 mapped:3473656 shmem:3474196 pagetables:610 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:4015222 free_pcp:95 free_cma:48 [ 131.659904] Node 0 active_anon:660480kB inactive_anon:222080256kB active_file:896kB inactive_file:16669696kB unevictable:9024kB isolated(anon):0kB isolated(file):0kB mapped:222261312kB dirty:16669504kB writeback:0kB shmem:222319552kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0k B kernel_stack:47872kB shadow_call_stack:62144kB pagetables:28800kB sec_pagetables:0kB all_unreclaimable? yes [ 131.659908] Node 0 DMA free:1041984kB boost:0kB min:69888kB low:87360kB high:104832kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:466176kB unevictable:0kB writepending:466176kB present:2097152kB managed:2029632kB mlocked:0kB bounce:0kB free_pcp:0kB loca l_pcp:0kB free_cma:3072kB [ 131.659911] lowmem_reserve[]: 0 0 15189 15189 15189 [ 131.659915] Node 0 Normal free:8566336kB boost:0kB min:8575808kB low:10719744kB high:12863680kB reserved_highatomic:0KB active_anon:660480kB inactive_anon:222080256kB active_file:896kB inactive_file:16203520kB unevictable:9024kB writepending:16203328kB present:249244544kB managed:248932800kB mloc ked:0kB bounce:0kB free_pcp:6080kB local_pcp:6080kB free_cma:0kB [ 131.659918] lowmem_reserve[]: 0 0 0 0 0 [ 131.659922] Node 0 DMA: 1*64kB (M) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UC) 3*2048kB (UMC) 2*4096kB (UM) 1*8192kB (U) 2*16384kB (UM) 2*32768kB (UM) 2*65536kB (UM) 2*131072kB (UM) 2*262144kB (UM) 0*524288kB = 1041984kB [ 131.659936] Node 0 Normal: 439*64kB (UE) 333*128kB (UME) 192*256kB (UME) 91*512kB (UME) 31*1024kB (UE) 12*2048kB (UME) 5*4096kB (UE) 2*8192kB (U) 3*16384kB (UME) 2*32768kB (UE) 1*65536kB (U) 2*131072kB (UE) 2*262144kB (UE) 14*524288kB (M) = 8566336kB [ 131.659952] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB [ 131.659955] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB [ 131.659956] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 131.659957] 3748043 total pagecache pages [ 131.659959] 6731 pages in swap cache [ 131.659960] Free swap = 0kB [ 131.659961] Total swap = 8388544kB [ 131.659961] 7858556 pages RAM [ 131.659962] 0 pages HighMem/MovableOnly [ 131.659963] 12344 pages reserved [ 131.659964] 8192 pages cma reserved [ 131.659965] 0 pages hwpoisoned [Fix] * The upstream patch wakes up flusher threads if there are too many dirty entries in the coldest LRU generation * This happens when trying to shrink lruvecs, so reclaim only gets woken up during high memory pressure * Fix was introduced by commit: 1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM [Regression Potential] * This commit fixes the memory reclaim path, so regressions would likely show up during increased system memory pressure * According to the upstream patch, increased SSD/disk wearing is possible due to waking up flusher threads, although these have not been noted in testing To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2097214/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp