On Thu, Nov 07, 2024 at 10:54:29PM +1100, Michael Ellerman wrote: > Gautam Menghani <gau...@linux.ibm.com> writes: > > As per the kernel documentation[1], hardlockup detector should be > > disabled in KVM guests as it may give false positives. On PPC, hardlockup > > detector is broken inside KVM guests because disable_hardlockup_detector() > > Isn't it the opposite? Inside KVM guests, the hardlockup detector should > be *disabled*, but it's not it's *enabled*, due to this bug. > > ie. it's not broken, it's working, but that's the bug.
Yes right, will change the description in v2. > > > is marked as early_initcall and it uses is_kvm_guest(), which is > > initialized by check_kvm_guest() later during boot as it is a > > core_initcall. check_kvm_guest() is also called in pSeries_smp_probe(), > > which is called before initcalls, but it is skipped if KVM guest does > > not have doorbell support or if the guest is launched with SMT=1. > > I'm wondering how no one has noticed. Most KVM guests have SMT=1. Looking at the commit history, code around hardlockups and pSeries_smp_probe() was changed around 2021/2022 timeframe, and I believe KVM wasn't being actively tested at the time. Even I noticed this only after coming across the documentation that said hardlockups should be disabled. So probably this feature decision isn't widely known. > > > Move the check_kvm_guest() call in pSeries_smp_probe() to the initial > > part of function before doorbell/SMT checks so that "kvm_guest" static > > key is initialized by the time disable_hardlockup_detector() runs. > > check_kvm_guest() is safe to be called multiple times so > disable_hardlockup_detector() should just call it before it calls > is_kvm_guest(). That should avoid future breakage when the order of > calls changes, or someone refactors pSeries_smp_probe(). Yeah I did that initially but in the worst case, that results in 3 calls to check_kvm_guest() - the core_initcall, pseries_smp_probe() call and then disable_hardlockup_detector(). Will that be fine? > > Can you identify the commit that broke this and include a Fixes: tag > please. Yes will do Thanks, Gautam