Am 17/01/2024 um 15:45 schrieb Friedrich Weber: > Users have been reporting [1] that VMs occasionally become > unresponsive with high CPU usage for some time (varying between ~1 and > more than 60 seconds). After that time, the guests come back and > continue running. Windows VMs seem most affected (not responding to > pings during the hang, RDP sessions time out), but we also got reports > about Linux VMs (reporting soft lockups). The issue was not present on > host kernel 5.15 and was first reported with kernel 6.2. Users > reported that the issue becomes easier to trigger the more memory is > assigned to the guests. Setting mitigations=off was reported to > alleviate (but not eliminate) the issue. For most users the issue > seems to disappear after (also) disabling KSM [2], but some users > reported freezes even with KSM disabled [3]. > > It turned out the reports concerned NUMA hosts only, and that the > freezes correlated with runs of the NUMA balancer [4]. Users reported > that disabling the NUMA balancer resolves the issue (even with KSM > enabled). > > We put together a Linux VM reproducer, ran a git-bisect on the kernel > to find the commit introducing the issue and asked upstream for help > [5]. As it turned out, an upstream bugreport was recently opened [6] > and a preliminary fix to the KVM TDP MMU was proposed [7]. With that > patch [7] on top of kernel 6.7, the reproducer does not trigger > freezes anymore. As of now, the patch (or its v2 [8]) is not yet > merged in the mainline kernel, and backporting it may be difficult due > to dependencies on other KVM changes [9]. > > However, the bugreport [6] also prompted an upstream developer to > propose a patch to the kernel scheduler logic that decides whether a > contended spinlock/rwlock should be dropped [10]. Without the patch, > PREEMPT_DYNAMIC kernels (such as ours) would always drop contended > locks. With the patch, the kernel only drops contended locks if the > kernel is currently set to preempt=full. As noted in the commit > message [10], this can (counter-intuitively) improve KVM performance. > Our kernel defaults to preempt=voluntary (according to > /sys/kernel/debug/sched/preempt), so with the patch it does not drop > contended locks anymore, and the reproducer does not trigger freezes > anymore. Hence, backport [10] to our kernel. > > [1] https://forum.proxmox.com/threads/130727/ > [2] https://forum.proxmox.com/threads/130727/page-4#post-575886 > [3] https://forum.proxmox.com/threads/130727/page-8#post-617587 > [4] > https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing > [5] > https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a18...@proxmox.com/ > [6] https://bugzilla.kernel.org/show_bug.cgi?id=218259 > [7] https://lore.kernel.org/all/20230825020733.2849862-1-sea...@google.com/ > [8] https://lore.kernel.org/all/20240110012045.505046-1-sea...@google.com/ > [9] https://lore.kernel.org/kvm/zaa654hwfkba_...@google.com/ > [10] https://lore.kernel.org/all/20240110214723.695930-1-sea...@google.com/ > > Signed-off-by: Friedrich Weber <f.we...@proxmox.com> > --- > > Notes: > This RFC is not meant to be applied immediately, but is intended to > sum up the current state of the issue and point out potential fixes. > > The patch [10] backported in this RFC hasn't been reviewed upstream > yet. And while it fixes the reproducer, it is not certain that it will > fix freezes seen by users on real-world workloads. Hence, it would be > desirable to also apply some variant of [7] [8] once it is applied > upstream, however there may be difficulties backporting it, as noted > above. > > So, in any case, for now it might sense to monitor how upstream > handles the situation, and then react accordingly. I'll continue to > participate upstream and send a v2 in due time. > > ...spinlocks-on-contention-iff-kernel-i.patch | 78 +++++++++++++++++++ > 1 file changed, 78 insertions(+) > create mode 100644 > patches/kernel/0018-sched-core-Drop-spinlocks-on-contention-iff-kernel-i.patch > >
this was actually already applied for 6.5.13-1, thanks! _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel