Amit Machhiwal <[email protected]> writes: > On IBM POWER systems, newer processor generations can operate in > compatibility modes corresponding to earlier generations. This becomes > relevant for nested virtualization, where nested KVM guests may need to > run with a specific processor compatibility level. > > Currently, when running a nested KVM guest (L2) inside a Power11 pSeries > logical partition (L1) booted in Power10 compatibility mode, the guest > fails to boot while setting 'arch_compat'. This happens because the CPU > class is derived from the hardware PVR (via mfspr()), which reflects the > physical processor generation (Power11), rather than the effective > compatibility mode (Power10). > > As a result, userspace may request a Power11 arch_compat for the L2 > guest. However, the L1 partition, running in Power10 compatibility, has > only negotiated support up to Power10 with the Power Hypervisor (L0). > When H_SET_STATE is invoked with a Power11 Logical PVR, the hypervisor
s/H_SET_STATE/H_GUEST_SET_STATE > rejects the request, leading to a late guest boot failure: > > KVM-NESTEDv2: couldn't set guest wide elements > [..KVM reg dump..] > I think irrespective of the other UAPI changes, we should still get this fixed - so that we don't see a late KVM guest boot failure msgs. So, in this review, I would like to mainly look at fixing this issue first and would request if we can defer the UAPI changes as a separate patch series please. > This situation should be detected earlier. Rejecting unsupported > 'arch_compat' values in 'kvmppc_set_arch_compat()' avoids issuing an > invalid H_SET_STATE hcall and provides a clearer failure mode. s/H_SET_STATE/H_GUEST_SET_STATE > > Add a check to reject Power11 'arch_compat' requests when the host is > running in Power10 compatibility mode, returning -EINVAL early instead > of deferring the failure to the hypervisor. > > Suggested-by: Vaibhav Jain <[email protected]> > Tested-by: Anushree Mathur <[email protected]> > Signed-off-by: Amit Machhiwal <[email protected]> > --- > arch/powerpc/kvm/book3s_hv.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 61dbeea317f3..249d1f2e4e2c 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -446,7 +446,19 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, > u32 arch_compat) > guest_pcr_bit = PCR_ARCH_300; > break; > case PVR_ARCH_31: > + guest_pcr_bit = PCR_ARCH_31; > + break; > case PVR_ARCH_31_P11: > + /* > + * Need to check this for ISA 3.1, as Power10 and > + * Power11 share the same PCR. For any subsequent ISA > + * versions, this will be taken care of by the guest vs > + * host PCR comparison below. > + */ > + if ((PVR_ARCH_31 & cur_cpu_spec->pvr_mask) == > + cur_cpu_spec->pvr_value) { > + return -EINVAL; > + } Instead of the complicated check can we simply do this? if (!cpu_has_feature(CPU_FTR_P11_PVR)) return -EINVAL; which means that if the Qemu is trying to set the arch_compat with P11 PVR (arch_compat) and if the host cpu FTR doesn't support P11 PVR, then simply return -EINVAL -ritesh
