On 16.09.2022 23:35, Boris Ostrovsky wrote:
> 
> On 9/16/22 8:52 AM, Jan Beulich wrote:
>> On 15.09.2022 16:01, Tamas K Lengyel wrote:
>>> While experimenting with the vPMU subsystem an ASSERT failure was
>>> observed in vmx_find_msr because the vcpu_runnable state was true.
>>>
>>> The root cause of the bug appears to be the fact that the vPMU subsystem
>>> doesn't save its state on context_switch. The vpmu_load function will 
>>> attempt
>>> to gather the PMU state if its still loaded two different ways:
>>>      1. if the current pcpu is not where the vcpu ran before doing a remote 
>>> save
>>>      2. if the current pcpu had another vcpu active before doing a local 
>>> save
>>>
>>> However, in case the prev vcpu is being rescheduled on another pcpu its 
>>> state
>>> has already changed and vcpu_runnable is returning true, thus #2 will trip 
>>> the
>>> ASSERT. The only way to avoid this race condition is to make sure the
>>> prev vcpu is paused while being checked and its context saved. Once the prev
>>> vcpu is resumed and does #1 it will find its state already saved.
>> While I consider this explanation plausible, I'm worried:
>>
>>> --- a/xen/arch/x86/cpu/vpmu.c
>>> +++ b/xen/arch/x86/cpu/vpmu.c
>>> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
>>>           vpmu = vcpu_vpmu(prev);
>>>   
>>>           /* Someone ran here before us */
>>> +        vcpu_pause(prev);
>>>           vpmu_save_force(prev);
>>>           vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>>> +        vcpu_unpause(prev);
>>>   
>>>           vpmu = vcpu_vpmu(v);
>>>       }
>> We're running with IRQs off here, yet vcpu_pause() waits for the vcpu
>> to actually be de-scheduled. Even with IRQs on this is already a
>> relatively heavy operation (also including its impact on the remote
>> side). Additionally the function is called from context_switch(), and
>> I'm unsure of the usability of vcpu_pause() on such a path. In
>> particular: Is there a risk of two CPUs doing this mutually to one
>> another? If so, is deadlocking excluded?
>>
>> Hence at the very least I think the description wants extending, to
>> discuss the safety of the change.
>>
>> Boris - any chance you could comment here? Iirc that's code you did
>> introduce.
> 
> 
> Is the assertion in vmx_find_msr() really needs to be for runnable vcpu or 
> can it be a check on whether vcpu is actually running (e.g. RUNSTATE_running)?

You cannot safely check for "running", as "runnable" may transition
to/from "running" behind your back.

Jan

> I think call chain vpmu_load()->..->*_vpmu_save()->...->vmx_find_msr() is the 
> only one where we are doing it for non-current vcpu. If we can guarantee that 
> run state is set after vpmu_load() call (maybe it is already, I haven't 
> checked) then testing the state may avoid the assertion.
> 
> 
> -boris
> 


Reply via email to