On Thu, Feb 27, 2025, Manali Shukla wrote:
> On 2/26/2025 3:37 AM, Sean Christopherson wrote:
> >> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
> >>            if (vnmi)
> >>                    kvm_cpu_cap_set(X86_FEATURE_VNMI);
> >>  
> >> +          kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
> > 
> > I am 99% certain this is wrong.  Or at the very least, severly lacking an
> > explanation of why it's correct.  If L1 enables Idle HLT but not HLT 
> > interception,
> > then it is KVM's responsibility to NOT exit to L1 if there is a pending 
> > V_IRQ or
> > V_NMI.
> > 
> > Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support 
> > is buggy.
> > If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 
> > with
> > HLT exiting and so it becomes KVM's responsibility to check for valid L2 
> > wake events
> > prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles 
> > this by
> > reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of 
> > vmx_has_nested_events().  I
> > don't see the equivalent in nSVM.
> > 
> > Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. 
> > if there
> > is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will 
> > naturally do
> > the right thing, i.e. not hang the vCPU.
> > 
> > Anyways, for now, I think the easiest and best option is to simply skip 
> > full nested
> > support for the moment.
> > 
> 
> Got it, I see the issue you're talking about. I'll need to look into it a bit 
> more to
> fully understand it. So yeah, we can hold off on full nested support for idle 
> HLT 
> intercept for now.
> 
> Since we are planning to disable Idle HLT support on nested guests, should we 
> do
> something like this ?
> 
> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
>         if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
>                 vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
> 
> +       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
> +               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
> +
> 
> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it 
> also copies
> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't 
> a problem 
> because the HLT intercept takes priority when it's on. But if the HLT 
> intercept gets 
> turned off for some reason, the IDLE HLT intercept will stay on, which is not 
> what we
> want.

Why don't we want that?

Reply via email to