Public bug reported:

[Impact]
 * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page
   allocation failures
 * This happens due to page reclaim not waking up flusher threads
 * OOM can be triggered even if the system has enough available memory

[Test Plan]
 * For the bug to properly trigger, we should uninstall apport and use the
   attached alloc_and_crash.c reproducer
 * alloc_and_crash will mmap a huge range of memory, memset it and forcibly 
SEGFAULT
 * The attached bash script will membind alloc_and_crash to NUMA node 0, so we
   can see the allocation failures in dmesg
   $ sudo apt remove --purge apport
   $ sudo dmesg -c; ./repro.bash; sleep 2; sudo dmesg

[Fix]
 * The upstream patch wakes up flusher threads if there are too many dirty
   entries in the coldest LRU generation
 * This happens when trying to shrink lruvecs, so reclaim only gets woken up
   during high memory pressure
 * Fix was introduced by commit:
     1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM

[Regression Potential]
 * This commit fixes the memory reclaim path, so regressions would likely show
   up during increased system memory pressure
 * According to the upstream patch, increased SSD/disk wearing is possible due
   to waking up flusher threads, although these have not been noted in testing

** Affects: linux (Ubuntu)
     Importance: High
     Assignee: Heitor Alves de Siqueira (halves)
         Status: Confirmed

** Affects: linux (Ubuntu Noble)
     Importance: High
     Assignee: Heitor Alves de Siqueira (halves)
         Status: Confirmed

** Affects: linux (Ubuntu Oracular)
     Importance: Medium
     Assignee: Heitor Alves de Siqueira (halves)
         Status: Confirmed

** Affects: linux (Ubuntu Plucky)
     Importance: High
     Assignee: Heitor Alves de Siqueira (halves)
         Status: Confirmed

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Oracular)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Plucky)
   Importance: High
     Assignee: Heitor Alves de Siqueira (halves)
       Status: Confirmed

** Changed in: linux (Ubuntu Oracular)
     Assignee: (unassigned) => Heitor Alves de Siqueira (halves)

** Changed in: linux (Ubuntu Noble)
     Assignee: (unassigned) => Heitor Alves de Siqueira (halves)

** Changed in: linux (Ubuntu Oracular)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Noble)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Oracular)
   Importance: High => Medium

** Changed in: linux (Ubuntu Oracular)
       Status: New => Confirmed

** Changed in: linux (Ubuntu Noble)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2097214

Title:
  MGLRU: page allocation failure on NUMA-enabled systems

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Noble:
  Confirmed
Status in linux source package in Oracular:
  Confirmed
Status in linux source package in Plucky:
  Confirmed

Bug description:
  [Impact]
   * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause 
page
     allocation failures
   * This happens due to page reclaim not waking up flusher threads
   * OOM can be triggered even if the system has enough available memory

  [Test Plan]
   * For the bug to properly trigger, we should uninstall apport and use the
     attached alloc_and_crash.c reproducer
   * alloc_and_crash will mmap a huge range of memory, memset it and forcibly 
SEGFAULT
   * The attached bash script will membind alloc_and_crash to NUMA node 0, so we
     can see the allocation failures in dmesg
     $ sudo apt remove --purge apport
     $ sudo dmesg -c; ./repro.bash; sleep 2; sudo dmesg

  [Fix]
   * The upstream patch wakes up flusher threads if there are too many dirty
     entries in the coldest LRU generation
   * This happens when trying to shrink lruvecs, so reclaim only gets woken up
     during high memory pressure
   * Fix was introduced by commit:
       1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup 
OOM

  [Regression Potential]
   * This commit fixes the memory reclaim path, so regressions would likely show
     up during increased system memory pressure
   * According to the upstream patch, increased SSD/disk wearing is possible due
     to waking up flusher threads, although these have not been noted in testing

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2097214/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to