[Kernel-packages] [Bug 2097214] Re: MGLRU: page allocation failure on NUMA-enabled systems

Heitor Alves de Siqueira Tue, 04 Feb 2025 19:32:45 -0800

** Description changed:

  [Impact]
-  * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause 
page
-    allocation failures
-  * This happens due to page reclaim not waking up flusher threads
-  * OOM can be triggered even if the system has enough available memory
+  * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause 
page
+    allocation failures
+  * This happens due to page reclaim not waking up flusher threads
+  * OOM can be triggered even if the system has enough available memory
  
  [Test Plan]
-  * For the bug to properly trigger, we should uninstall apport and use the
-    attached alloc_and_crash.c reproducer
-  * alloc_and_crash will mmap a huge range of memory, memset it and forcibly 
SEGFAULT
-  * The attached bash script will membind alloc_and_crash to NUMA node 0, so we
-    can see the allocation failures in dmesg
-    $ sudo apt remove --purge apport
-    $ sudo dmesg -c; ./repro.bash; sleep 2; sudo dmesg
+  * For the bug to properly trigger, we should uninstall apport and use the
+    attached alloc_and_crash.c reproducer
+  * alloc_and_crash will mmap a huge range of memory, memset it and forcibly 
SEGFAULT
+  * The attached bash script will membind alloc_and_crash to NUMA node 0, so we
+    can see the allocation failures in dmesg
+    $ sudo apt remove --purge apport
+    $ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg
+ 
+ [  124.974328] nvme 0014:01:00.0: Using 48-bit DMA addresses
+ [  131.659813] alloc_and_crash: page allocation failure: order:0, 
mode:0x141cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_WRITE), 
nodemask=0,cpuset=/,mems_allowed=0-1
+ [  131.659827] CPU: 114 PID: 2758 Comm: alloc_and_crash Not tainted 
6.8.0-1021-nvidia-64k #23-Ubuntu
+ [  131.659830] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023
+ [  131.659832] Call trace:
+ [  131.659834]  dump_backtrace+0xa4/0x150
+ [  131.659841]  show_stack+0x24/0x50
+ [  131.659843]  dump_stack_lvl+0xc8/0x138
+ [  131.659847]  dump_stack+0x1c/0x38
+ [  131.659849]  warn_alloc+0x16c/0x1f0
+ [  131.659853]  __alloc_pages_slowpath.constprop.0+0x8e4/0x9f0
+ [  131.659855]  __alloc_pages+0x2f0/0x3a8
+ [  131.659857]  alloc_pages_mpol+0x94/0x290
+ [  131.659860]  alloc_pages+0x6c/0x118
+ [  131.659861]  folio_alloc+0x24/0x98
+ [  131.659862]  filemap_alloc_folio+0x168/0x188
+ [  131.659865]  __filemap_get_folio+0x1bc/0x3f8
+ [  131.659867]  ext4_da_write_begin+0x144/0x300
+ [  131.659870]  generic_perform_write+0xc4/0x228
+ [  131.659872]  ext4_buffered_write_iter+0x78/0x180
+ [  131.659874]  ext4_file_write_iter+0x44/0xf0
+ [  131.659876]  __kernel_write_iter+0x10c/0x2c0
+ [  131.659878]  dump_user_range+0xe0/0x240
+ [  131.659881]  elf_core_dump+0x4cc/0x538
+ [  131.659884]  do_coredump+0x574/0x988
+ [  131.659885]  get_signal+0x7dc/0x8f0
+ [  131.659887]  do_signal+0x138/0x1f8
+ [  131.659888]  do_notify_resume+0x114/0x298
+ [  131.659890]  el0_da+0xdc/0x178
+ [  131.659892]  el0t_64_sync_handler+0xdc/0x158
+ [  131.659894]  el0t_64_sync+0x1b0/0x1b8
+ [  131.659896] Mem-Info:
+ [  131.659901] active_anon:12408 inactive_anon:3470004 isolated_anon:0
+                 active_file:2437 inactive_file:264544 isolated_file:0
+                 unevictable:609 dirty:260589 writeback:0
+                 slab_reclaimable:9016 slab_unreclaimable:34145
+                 mapped:3473656 shmem:3474196 pagetables:610
+                 sec_pagetables:0 bounce:0
+                 kernel_misc_reclaimable:0
+                 free:4015222 free_pcp:95 free_cma:48
+ [  131.659904] Node 0 active_anon:660480kB inactive_anon:222080256kB 
active_file:896kB inactive_file:16669696kB unevictable:9024kB 
isolated(anon):0kB isolated(file):0kB mapped:222261312kB dirty:16669504kB 
writeback:0kB shmem:222319552kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB 
writeback_tmp:0k
+ B kernel_stack:47872kB shadow_call_stack:62144kB pagetables:28800kB 
sec_pagetables:0kB all_unreclaimable? yes
+ [  131.659908] Node 0 DMA free:1041984kB boost:0kB min:69888kB low:87360kB 
high:104832kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:466176kB unevictable:0kB writepending:466176kB 
present:2097152kB managed:2029632kB mlocked:0kB bounce:0kB free_pcp:0kB loca
+ l_pcp:0kB free_cma:3072kB
+ [  131.659911] lowmem_reserve[]: 0 0 15189 15189 15189
+ [  131.659915] Node 0 Normal free:8566336kB boost:0kB min:8575808kB 
low:10719744kB high:12863680kB reserved_highatomic:0KB active_anon:660480kB 
inactive_anon:222080256kB active_file:896kB inactive_file:16203520kB 
unevictable:9024kB writepending:16203328kB present:249244544kB 
managed:248932800kB mloc
+ ked:0kB bounce:0kB free_pcp:6080kB local_pcp:6080kB free_cma:0kB
+ [  131.659918] lowmem_reserve[]: 0 0 0 0 0
+ [  131.659922] Node 0 DMA: 1*64kB (M) 0*128kB 2*256kB (UM) 2*512kB (UM) 
2*1024kB (UC) 3*2048kB (UMC) 2*4096kB (UM) 1*8192kB (U) 2*16384kB (UM) 
2*32768kB (UM) 2*65536kB (UM) 2*131072kB (UM) 2*262144kB (UM) 0*524288kB = 
1041984kB
+ [  131.659936] Node 0 Normal: 439*64kB (UE) 333*128kB (UME) 192*256kB (UME) 
91*512kB (UME) 31*1024kB (UE) 12*2048kB (UME) 5*4096kB (UE) 2*8192kB (U) 
3*16384kB (UME) 2*32768kB (UE) 1*65536kB (U) 2*131072kB (UE) 2*262144kB (UE) 
14*524288kB (M) = 8566336kB
+ [  131.659952] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=16777216kB
+ [  131.659955] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=524288kB
+ [  131.659956] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
+ [  131.659957] 3748043 total pagecache pages
+ [  131.659959] 6731 pages in swap cache
+ [  131.659960] Free swap  = 0kB
+ [  131.659961] Total swap = 8388544kB
+ [  131.659961] 7858556 pages RAM
+ [  131.659962] 0 pages HighMem/MovableOnly
+ [  131.659963] 12344 pages reserved
+ [  131.659964] 8192 pages cma reserved
+ [  131.659965] 0 pages hwpoisoned
  
  [Fix]
-  * The upstream patch wakes up flusher threads if there are too many dirty
-    entries in the coldest LRU generation
-  * This happens when trying to shrink lruvecs, so reclaim only gets woken up
-    during high memory pressure
-  * Fix was introduced by commit:
-      1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup 
OOM
+  * The upstream patch wakes up flusher threads if there are too many dirty
+    entries in the coldest LRU generation
+  * This happens when trying to shrink lruvecs, so reclaim only gets woken up
+    during high memory pressure
+  * Fix was introduced by commit:
+      1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup 
OOM
  
  [Regression Potential]
-  * This commit fixes the memory reclaim path, so regressions would likely show
-    up during increased system memory pressure
-  * According to the upstream patch, increased SSD/disk wearing is possible due
-    to waking up flusher threads, although these have not been noted in testing
+  * This commit fixes the memory reclaim path, so regressions would likely show
+    up during increased system memory pressure
+  * According to the upstream patch, increased SSD/disk wearing is possible due
+    to waking up flusher threads, although these have not been noted in testing


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2097214

Title:
  MGLRU: page allocation failure on NUMA-enabled systems

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Noble:
  Confirmed
Status in linux source package in Oracular:
  Confirmed
Status in linux source package in Plucky:
  Confirmed

Bug description:
  [Impact]
   * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause 
page
     allocation failures
   * This happens due to page reclaim not waking up flusher threads
   * OOM can be triggered even if the system has enough available memory

  [Test Plan]
   * For the bug to properly trigger, we should uninstall apport and use the
     attached alloc_and_crash.c reproducer
   * alloc_and_crash will mmap a huge range of memory, memset it and forcibly 
SEGFAULT
   * The attached bash script will membind alloc_and_crash to NUMA node 0, so we
     can see the allocation failures in dmesg
     $ sudo apt remove --purge apport
     $ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg

  [  124.974328] nvme 0014:01:00.0: Using 48-bit DMA addresses
  [  131.659813] alloc_and_crash: page allocation failure: order:0, 
mode:0x141cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_WRITE), 
nodemask=0,cpuset=/,mems_allowed=0-1
  [  131.659827] CPU: 114 PID: 2758 Comm: alloc_and_crash Not tainted 
6.8.0-1021-nvidia-64k #23-Ubuntu
  [  131.659830] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023
  [  131.659832] Call trace:
  [  131.659834]  dump_backtrace+0xa4/0x150
  [  131.659841]  show_stack+0x24/0x50
  [  131.659843]  dump_stack_lvl+0xc8/0x138
  [  131.659847]  dump_stack+0x1c/0x38
  [  131.659849]  warn_alloc+0x16c/0x1f0
  [  131.659853]  __alloc_pages_slowpath.constprop.0+0x8e4/0x9f0
  [  131.659855]  __alloc_pages+0x2f0/0x3a8
  [  131.659857]  alloc_pages_mpol+0x94/0x290
  [  131.659860]  alloc_pages+0x6c/0x118
  [  131.659861]  folio_alloc+0x24/0x98
  [  131.659862]  filemap_alloc_folio+0x168/0x188
  [  131.659865]  __filemap_get_folio+0x1bc/0x3f8
  [  131.659867]  ext4_da_write_begin+0x144/0x300
  [  131.659870]  generic_perform_write+0xc4/0x228
  [  131.659872]  ext4_buffered_write_iter+0x78/0x180
  [  131.659874]  ext4_file_write_iter+0x44/0xf0
  [  131.659876]  __kernel_write_iter+0x10c/0x2c0
  [  131.659878]  dump_user_range+0xe0/0x240
  [  131.659881]  elf_core_dump+0x4cc/0x538
  [  131.659884]  do_coredump+0x574/0x988
  [  131.659885]  get_signal+0x7dc/0x8f0
  [  131.659887]  do_signal+0x138/0x1f8
  [  131.659888]  do_notify_resume+0x114/0x298
  [  131.659890]  el0_da+0xdc/0x178
  [  131.659892]  el0t_64_sync_handler+0xdc/0x158
  [  131.659894]  el0t_64_sync+0x1b0/0x1b8
  [  131.659896] Mem-Info:
  [  131.659901] active_anon:12408 inactive_anon:3470004 isolated_anon:0
                  active_file:2437 inactive_file:264544 isolated_file:0
                  unevictable:609 dirty:260589 writeback:0
                  slab_reclaimable:9016 slab_unreclaimable:34145
                  mapped:3473656 shmem:3474196 pagetables:610
                  sec_pagetables:0 bounce:0
                  kernel_misc_reclaimable:0
                  free:4015222 free_pcp:95 free_cma:48
  [  131.659904] Node 0 active_anon:660480kB inactive_anon:222080256kB 
active_file:896kB inactive_file:16669696kB unevictable:9024kB 
isolated(anon):0kB isolated(file):0kB mapped:222261312kB dirty:16669504kB 
writeback:0kB shmem:222319552kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB 
writeback_tmp:0k
  B kernel_stack:47872kB shadow_call_stack:62144kB pagetables:28800kB 
sec_pagetables:0kB all_unreclaimable? yes
  [  131.659908] Node 0 DMA free:1041984kB boost:0kB min:69888kB low:87360kB 
high:104832kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:466176kB unevictable:0kB writepending:466176kB 
present:2097152kB managed:2029632kB mlocked:0kB bounce:0kB free_pcp:0kB loca
  l_pcp:0kB free_cma:3072kB
  [  131.659911] lowmem_reserve[]: 0 0 15189 15189 15189
  [  131.659915] Node 0 Normal free:8566336kB boost:0kB min:8575808kB 
low:10719744kB high:12863680kB reserved_highatomic:0KB active_anon:660480kB 
inactive_anon:222080256kB active_file:896kB inactive_file:16203520kB 
unevictable:9024kB writepending:16203328kB present:249244544kB 
managed:248932800kB mloc
  ked:0kB bounce:0kB free_pcp:6080kB local_pcp:6080kB free_cma:0kB
  [  131.659918] lowmem_reserve[]: 0 0 0 0 0
  [  131.659922] Node 0 DMA: 1*64kB (M) 0*128kB 2*256kB (UM) 2*512kB (UM) 
2*1024kB (UC) 3*2048kB (UMC) 2*4096kB (UM) 1*8192kB (U) 2*16384kB (UM) 
2*32768kB (UM) 2*65536kB (UM) 2*131072kB (UM) 2*262144kB (UM) 0*524288kB = 
1041984kB
  [  131.659936] Node 0 Normal: 439*64kB (UE) 333*128kB (UME) 192*256kB (UME) 
91*512kB (UME) 31*1024kB (UE) 12*2048kB (UME) 5*4096kB (UE) 2*8192kB (U) 
3*16384kB (UME) 2*32768kB (UE) 1*65536kB (U) 2*131072kB (UE) 2*262144kB (UE) 
14*524288kB (M) = 8566336kB
  [  131.659952] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=16777216kB
  [  131.659955] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=524288kB
  [  131.659956] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
  [  131.659957] 3748043 total pagecache pages
  [  131.659959] 6731 pages in swap cache
  [  131.659960] Free swap  = 0kB
  [  131.659961] Total swap = 8388544kB
  [  131.659961] 7858556 pages RAM
  [  131.659962] 0 pages HighMem/MovableOnly
  [  131.659963] 12344 pages reserved
  [  131.659964] 8192 pages cma reserved
  [  131.659965] 0 pages hwpoisoned

  [Fix]
   * The upstream patch wakes up flusher threads if there are too many dirty
     entries in the coldest LRU generation
   * This happens when trying to shrink lruvecs, so reclaim only gets woken up
     during high memory pressure
   * Fix was introduced by commit:
       1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup 
OOM

  [Regression Potential]
   * This commit fixes the memory reclaim path, so regressions would likely show
     up during increased system memory pressure
   * According to the upstream patch, increased SSD/disk wearing is possible due
     to waking up flusher threads, although these have not been noted in testing

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2097214/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2097214] Re: MGLRU: page allocation failure on NUMA-enabled systems

Reply via email to