Public bug reported: [Impact] * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page allocation failures * This happens due to page reclaim not waking up flusher threads * OOM can be triggered even if the system has enough available memory
[Test Plan] * For the bug to properly trigger, we should uninstall apport and use the attached alloc_and_crash.c reproducer * alloc_and_crash will mmap a huge range of memory, memset it and forcibly SEGFAULT * The attached bash script will membind alloc_and_crash to NUMA node 0, so we can see the allocation failures in dmesg $ sudo apt remove --purge apport $ sudo dmesg -c; ./repro.bash; sleep 2; sudo dmesg [Fix] * The upstream patch wakes up flusher threads if there are too many dirty entries in the coldest LRU generation * This happens when trying to shrink lruvecs, so reclaim only gets woken up during high memory pressure * Fix was introduced by commit: 1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM [Regression Potential] * This commit fixes the memory reclaim path, so regressions would likely show up during increased system memory pressure * According to the upstream patch, increased SSD/disk wearing is possible due to waking up flusher threads, although these have not been noted in testing ** Affects: linux (Ubuntu) Importance: High Assignee: Heitor Alves de Siqueira (halves) Status: Confirmed ** Affects: linux (Ubuntu Noble) Importance: High Assignee: Heitor Alves de Siqueira (halves) Status: Confirmed ** Affects: linux (Ubuntu Oracular) Importance: Medium Assignee: Heitor Alves de Siqueira (halves) Status: Confirmed ** Affects: linux (Ubuntu Plucky) Importance: High Assignee: Heitor Alves de Siqueira (halves) Status: Confirmed ** Also affects: linux (Ubuntu Noble) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Oracular) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Plucky) Importance: High Assignee: Heitor Alves de Siqueira (halves) Status: Confirmed ** Changed in: linux (Ubuntu Oracular) Assignee: (unassigned) => Heitor Alves de Siqueira (halves) ** Changed in: linux (Ubuntu Noble) Assignee: (unassigned) => Heitor Alves de Siqueira (halves) ** Changed in: linux (Ubuntu Oracular) Importance: Undecided => High ** Changed in: linux (Ubuntu Noble) Importance: Undecided => High ** Changed in: linux (Ubuntu Oracular) Importance: High => Medium ** Changed in: linux (Ubuntu Oracular) Status: New => Confirmed ** Changed in: linux (Ubuntu Noble) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2097214 Title: MGLRU: page allocation failure on NUMA-enabled systems Status in linux package in Ubuntu: Confirmed Status in linux source package in Noble: Confirmed Status in linux source package in Oracular: Confirmed Status in linux source package in Plucky: Confirmed Bug description: [Impact] * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page allocation failures * This happens due to page reclaim not waking up flusher threads * OOM can be triggered even if the system has enough available memory [Test Plan] * For the bug to properly trigger, we should uninstall apport and use the attached alloc_and_crash.c reproducer * alloc_and_crash will mmap a huge range of memory, memset it and forcibly SEGFAULT * The attached bash script will membind alloc_and_crash to NUMA node 0, so we can see the allocation failures in dmesg $ sudo apt remove --purge apport $ sudo dmesg -c; ./repro.bash; sleep 2; sudo dmesg [Fix] * The upstream patch wakes up flusher threads if there are too many dirty entries in the coldest LRU generation * This happens when trying to shrink lruvecs, so reclaim only gets woken up during high memory pressure * Fix was introduced by commit: 1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM [Regression Potential] * This commit fixes the memory reclaim path, so regressions would likely show up during increased system memory pressure * According to the upstream patch, increased SSD/disk wearing is possible due to waking up flusher threads, although these have not been noted in testing To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2097214/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp