** Tags added: kernel-daily-bug
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2097214
Title:
MGLRU: page allocation failure on NUMA-enabled systems
Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Noble:
Fix Released
Status in linux source package in Oracular:
Fix Released
Status in linux source package in Plucky:
Confirmed
Bug description:
[Impact]
* On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause
page
allocation failures
* This happens due to page reclaim not waking up flusher threads
* OOM can be triggered even if the system has enough available memory
[Test Plan]
* For the bug to properly trigger, we should uninstall apport and use the
attached alloc_and_crash.c reproducer
* alloc_and_crash will mmap a huge range of memory, memset it and forcibly
SEGFAULT
* The attached bash script will membind alloc_and_crash to NUMA node 0, so we
can see the allocation failures in dmesg
$ sudo apt remove --purge apport
$ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg
[ 124.974328] nvme 0014:01:00.0: Using 48-bit DMA addresses
[ 131.659813] alloc_and_crash: page allocation failure: order:0,
mode:0x141cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_WRITE),
nodemask=0,cpuset=/,mems_allowed=0-1
[ 131.659827] CPU: 114 PID: 2758 Comm: alloc_and_crash Not tainted
6.8.0-1021-nvidia-64k #23-Ubuntu
[ 131.659830] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023
[ 131.659832] Call trace:
[ 131.659834] dump_backtrace+0xa4/0x150
[ 131.659841] show_stack+0x24/0x50
[ 131.659843] dump_stack_lvl+0xc8/0x138
[ 131.659847] dump_stack+0x1c/0x38
[ 131.659849] warn_alloc+0x16c/0x1f0
[ 131.659853] __alloc_pages_slowpath.constprop.0+0x8e4/0x9f0
[ 131.659855] __alloc_pages+0x2f0/0x3a8
[ 131.659857] alloc_pages_mpol+0x94/0x290
[ 131.659860] alloc_pages+0x6c/0x118
[ 131.659861] folio_alloc+0x24/0x98
[ 131.659862] filemap_alloc_folio+0x168/0x188
[ 131.659865] __filemap_get_folio+0x1bc/0x3f8
[ 131.659867] ext4_da_write_begin+0x144/0x300
[ 131.659870] generic_perform_write+0xc4/0x228
[ 131.659872] ext4_buffered_write_iter+0x78/0x180
[ 131.659874] ext4_file_write_iter+0x44/0xf0
[ 131.659876] __kernel_write_iter+0x10c/0x2c0
[ 131.659878] dump_user_range+0xe0/0x240
[ 131.659881] elf_core_dump+0x4cc/0x538
[ 131.659884] do_coredump+0x574/0x988
[ 131.659885] get_signal+0x7dc/0x8f0
[ 131.659887] do_signal+0x138/0x1f8
[ 131.659888] do_notify_resume+0x114/0x298
[ 131.659890] el0_da+0xdc/0x178
[ 131.659892] el0t_64_sync_handler+0xdc/0x158
[ 131.659894] el0t_64_sync+0x1b0/0x1b8
[ 131.659896] Mem-Info:
[ 131.659901] active_anon:12408 inactive_anon:3470004 isolated_anon:0
active_file:2437 inactive_file:264544 isolated_file:0
unevictable:609 dirty:260589 writeback:0
slab_reclaimable:9016 slab_unreclaimable:34145
mapped:3473656 shmem:3474196 pagetables:610
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:4015222 free_pcp:95 free_cma:48
[ 131.659904] Node 0 active_anon:660480kB inactive_anon:222080256kB
active_file:896kB inactive_file:16669696kB unevictable:9024kB
isolated(anon):0kB isolated(file):0kB mapped:222261312kB dirty:16669504kB
writeback:0kB shmem:222319552kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB
writeback_tmp:0k
B kernel_stack:47872kB shadow_call_stack:62144kB pagetables:28800kB
sec_pagetables:0kB all_unreclaimable? yes
[ 131.659908] Node 0 DMA free:1041984kB boost:0kB min:69888kB low:87360kB
high:104832kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:466176kB unevictable:0kB writepending:466176kB
present:2097152kB managed:2029632kB mlocked:0kB bounce:0kB free_pcp:0kB loca
l_pcp:0kB free_cma:3072kB
[ 131.659911] lowmem_reserve[]: 0 0 15189 15189 15189
[ 131.659915] Node 0 Normal free:8566336kB boost:0kB min:8575808kB
low:10719744kB high:12863680kB reserved_highatomic:0KB active_anon:660480kB
inactive_anon:222080256kB active_file:896kB inactive_file:16203520kB
unevictable:9024kB writepending:16203328kB present:249244544kB
managed:248932800kB mloc
ked:0kB bounce:0kB free_pcp:6080kB local_pcp:6080kB free_cma:0kB
[ 131.659918] lowmem_reserve[]: 0 0 0 0 0
[ 131.659922] Node 0 DMA: 1*64kB (M) 0*128kB 2*256kB (UM) 2*512kB (UM)
2*1024kB (UC) 3*2048kB (UMC) 2*4096kB (UM) 1*8192kB (U) 2*16384kB (UM)
2*32768kB (UM) 2*65536kB (UM) 2*131072kB (UM) 2*262144kB (UM) 0*524288kB =
1041984kB
[ 131.659936] Node 0 Normal: 439*64kB (UE) 333*128kB (UME) 192*256kB (UME)
91*512kB (UME) 31*1024kB (UE) 12*2048kB (UME) 5*4096kB (UE) 2*8192kB (U)
3*16384kB (UME) 2*32768kB (UE) 1*65536kB (U) 2*131072kB (UE) 2*262144kB (UE)
14*524288kB (M) = 8566336kB
[ 131.659952] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=16777216kB
[ 131.659955] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=524288kB
[ 131.659956] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
[ 131.659957] 3748043 total pagecache pages
[ 131.659959] 6731 pages in swap cache
[ 131.659960] Free swap = 0kB
[ 131.659961] Total swap = 8388544kB
[ 131.659961] 7858556 pages RAM
[ 131.659962] 0 pages HighMem/MovableOnly
[ 131.659963] 12344 pages reserved
[ 131.659964] 8192 pages cma reserved
[ 131.659965] 0 pages hwpoisoned
[Fix]
* The upstream patch wakes up flusher threads if there are too many dirty
entries in the coldest LRU generation
* This happens when trying to shrink lruvecs, so reclaim only gets woken up
during high memory pressure
* Fix was introduced by commit:
1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup
OOM
[Regression Potential]
* This commit fixes the memory reclaim path, so regressions would likely show
up during increased system memory pressure
* According to the upstream patch, increased SSD/disk wearing is possible due
to waking up flusher threads, although these have not been noted in testing
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2097214/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp