Commit a373830f96db "KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU
before running it to avoid spurious interrupts" meanwhile landed in
v6.12-rc7.

commit a373830f96db288a3eb43a8692b6bcd0bd88dfe1
Author: Gautam Menghani <gau...@linux.ibm.com>
Date:   Mon Oct 28 14:34:09 2024 +0530

    KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to 
avoid spurious interrupts
    
    Running a L2 vCPU (see [1] for terminology) with LPCR_MER bit set and no
    pending interrupts results in that L2 vCPU getting an infinite flood of
    spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets the
    LPCR_MER bit if there are pending interrupts.
    
    The spurious flood problem can be observed in 2 cases:
    1. Crashing the guest while interrupt heavy workload is running
      a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm)
      b. While the workload is running, crash the guest (make sure kdump
         is configured)
      c. Any one of the vCPUs of the guest will start getting an infinite
         flood of spurious interrupts.
    
    2. Running LTP stress tests in multiple guests at the same time
       a. Start 4 L2 guests.
       b. Start running LTP stress tests on all 4 guests at same time.
       c. In some time, any one/more of the vCPUs of any of the guests will
          start getting an infinite flood of spurious interrupts.
    
    The root cause of both the above issues is the same:
    1. A NMI is sent to a running vCPU that has LPCR_MER bit set.
    2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE
       is called for all the registers.
    3. When H_GUEST_GET_STATE is called for LPCR, the vcpu->arch.vcore->lpcr
       of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this
       new value is always used whenever that vCPU runs, regardless of whether
       there was a pending interrupt.
    4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external
       interrupt handler, and this cycle never ends.
    
    Fix the spurious flood by masking off the LPCR_MER bit before running a
    L2 vCPU to ensure that it is not set if there are no pending interrupts.
    
    [1] Terminology:
    1. L0 : PAPR hypervisor running in HV mode
    2. L1 : Linux guest (logical partition) running on top of L0
    3. L2 : KVM guest running on top of L1
    
    Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is 
passed to the L0")
    Cc: sta...@vger.kernel.org # v6.8+
    Signed-off-by: Gautam Menghani <gau...@linux.ibm.com>
    Signed-off-by: Madhavan Srinivasan <ma...@linux.ibm.com>

Since it's upstream properly tagged as stable update, waiting on
Canonical Kernel team to pick this up.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to