This bug is awaiting verification that the linux-gcp-tcpx/6.8.0-1002.3 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-gcp-tcpx' to 'verification-done- jammy-linux-gcp-tcpx'. If the problem still exists, change the tag 'verification-needed-jammy-linux-gcp-tcpx' to 'verification-failed- jammy-linux-gcp-tcpx'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-jammy-linux-gcp-tcpx-v2 verification-needed-jammy-linux-gcp-tcpx -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2076147 Title: Add 'mm: hold PTL from the first PTE while reclaiming a large folio' to fix L2 Guest hang during LTP Test Status in The Ubuntu-power-systems project: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Noble: Fix Released Status in linux source package in Oracular: Fix Released Bug description: SRU Justification: [ Impact ] * KVM 2nd level guest (means KVM VM that runs nested on top of a Power 10 PowerVM hypervisor) hangs during LTP (Linux Test Projects) test suite. * It hangs with: "Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab" * Diagnosing the issues points this this fix/upstream-commit: [commit message, by Barry Song <v-songbao...@oppo.com>] Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE modifications preceded by pte clear. While iterating over PTEs of a large folio, it only starts acquiring PTL from the first valid (present) PTE. PTE modifications can temporarily set PTEs to pte_none. Consequently, the initial PTEs of a large folio might be skipped in try_to_unmap_one(). For example, for an anon folio, if we skip PTE0, we may have PTE0 which is still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after try_to_unmap_one(). So folio will be still mapped, the folio fails to be reclaimed and is put back to LRU in this round. This also breaks up PTEs optimization such as CONT-PTE on this large folio and may lead to accident folio_split() afterwards. And since a part of PTEs are now swap entries, accessing those parts will introduce overhead - do_swap_page. Although the kernel can withstand all of the above issues, the situation still seems quite awkward and warrants making it more ideal. The same race also occurs with small folios, but they have only one PTE, thus, it won't be possible for them to be partially unmapped. This patch [see below] holds PTL from PTE0, allowing us to avoid reading PTE values that are in the process of being transformed. With stable PTE values, we can ensure that this large folio is either completely reclaimed or that all PTEs remain untouched in this round. A corner case is that if we hold PTL from PTE0 and most initial PTEs have been really unmapped before that, we may increase the duration of holding PTL. Thus we only apply this optimization to folios which are still entirely mapped (not in deferred_split list). [ Fix ] * 73bc32875ee9 73bc32875ee9b1881dd780308c6793fe463fe803 "mm: hold PTL from the first PTE while reclaiming a large folio" [ Test Plan ] * An IBM Power 10 system (where PowerVM is mandatory) running Ubuntu Server 24.04 (kernel 6.8) or later with (nested) KVM setup (so KVM on top of PowerVM). * Run LTP test suite Tests running: SLS(io,base) * Without the patch the above test will hang with Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab [ Where problems could occur ] * This is a common code change in the memory management sub-system, hence great care needs to be taken, even if it was discussed upfront at the https://lore.kernel.org/ mailing list and the upstream commit provenance shows that many eyes had a look at this. * The modification is relatively small with just one if statement (across two lines) in mm/vmscan.c. * This change is to assist 'try_to_unmap' to acquire page table locks (PTL) from the first page table entry (PTE) and to eliminate the influence of temporary and volatile PTE values. * If done wrong it can especially have a negative impact in case of large folios. and wrong hints might be given to try_to_unmap which may lead to bad page swapping. * In case of an issue with this patch the result can also be decreased performance and efficiency in the page table handling - the opposite of what the patch is supposed to address. * Fortunately several developers had their eyes on this commit, as the provenance of the patch and the discussion at LKML shows. * Further upstream conversation: Link: https://lkml.kernel.org/r/20240306095219.71086-1-21cn...@gmail.com [ Other Info ] * The commit is upstream since v6.10(-rc1), hence it will be included in oracular with the planned target kernel of 6.11. * And since (nested) KVM virtualization on ppc64el was (re-)introduced just with noble, no older Ubuntu releases older than noble are affected. __________ == Comment: #0 - SEETEENA THOUFEEK <sthou...@in.ibm.com> - 2024-08-06 00:20:57 == +++ This bug was initially created as a clone of Bug #206372 +++ ---Problem Description--- L2 Guest hung during LTP Tests. Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab (edit) ---uname output--- NA ---Additional Hardware Info--- NA Contact Information = na ---Debugger Data--- NA ---Patches Installed--- NA ---Steps to Reproduce--- Tests running: SLS(io,base) LPAR Config: ============ PHYP Environment: PowerVM LPAR Hostname/IP: 10.33.2.107 Rootvg Filesystem: xfs Network Interface: Shiner-T vNIC/SR-IOV Config: n/a IO Type: SAN IO Disk Type: raw Multipath Enabled: No ------------------------------------------------------------------------------------- DUMP Config: ============ KDUMP configured: Yes XMON enabled no DUMP Available: no Machine Type = na Userspace rpm: NA The userspace tool has the following bit modes: NA Userspace tool obtained from project website: na Userspace tool common name: NA *Additional Instructions for na: -Post a private note with access information to the machine that is currently in the debugger. -Attach ltrace and strace of userspace application. please include this commit in Ubuntu 24.04 upstream commit which is solving these data store lockups: 73bc32875ee9b1881dd780308c6793fe463fe803 mm: hold PTL from the first PTE while reclaiming a large folio To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076147/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp