On Thu, Feb 27, 2025, Manali Shukla wrote:
> On 2/27/2025 8:02 PM, Sean Christopherson wrote:
> > On Thu, Feb 27, 2025, Manali Shukla wrote:
> >> On 2/26/2025 3:37 AM, Sean Christopherson wrote:
> >>>> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
> >>>>                  if (vnmi)
> >>>>                          kvm_cpu_cap_set(X86_FEATURE_VNMI);
> >>>>  
> >>>> +                kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
> >>>
> >>> I am 99% certain this is wrong.  Or at the very least, severly lacking an
> >>> explanation of why it's correct.  If L1 enables Idle HLT but not HLT 
> >>> interception,
> >>> then it is KVM's responsibility to NOT exit to L1 if there is a pending 
> >>> V_IRQ or
> >>> V_NMI.
> >>>
> >>> Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support 
> >>> is buggy.
> >>> If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 
> >>> with
> >>> HLT exiting and so it becomes KVM's responsibility to check for valid L2 
> >>> wake events
> >>> prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles 
> >>> this by
> >>> reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of 
> >>> vmx_has_nested_events().  I
> >>> don't see the equivalent in nSVM.
> >>>
> >>> Amusingly, that means Idle HLT is actually a bug fix to some extent.  
> >>> E.g. if there
> >>> is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will 
> >>> naturally do
> >>> the right thing, i.e. not hang the vCPU.
> >>>
> >>> Anyways, for now, I think the easiest and best option is to simply skip 
> >>> full nested
> >>> support for the moment.
> >>>
> >>
> >> Got it, I see the issue you're talking about. I'll need to look into it a 
> >> bit more to
> >> fully understand it. So yeah, we can hold off on full nested support for 
> >> idle HLT 
> >> intercept for now.
> >>
> >> Since we are planning to disable Idle HLT support on nested guests, should 
> >> we do
> >> something like this ?
> >>
> >> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
> >>         if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
> >>                 vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
> >>
> >> +       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
> >> +               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
> >> +
> >>
> >> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, 
> >> it also copies
> >> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this 
> >> isn't a problem 
> >> because the HLT intercept takes priority when it's on. But if the HLT 
> >> intercept gets 
> >> turned off for some reason, the IDLE HLT intercept will stay on, which is 
> >> not what we
> >> want.
> > 
> > Why don't we want that?
> 
> The idle-HLT intercept remains '1' for the L2 guest. Now, when L2 executes 
> HLT and there
> is no pending event available, it will still do idle-HLT exit, although Idle 
> HLT
> was never explicitly enabled on L2 guest.

Yes, but why is that a problem?  L1 doesn't want to intercept HLT, so KVM isn't
violating the architecture by not intercepting HLT.

Reply via email to