On 19.09.2022 14:25, Tamas K Lengyel wrote:
> On Mon, Sep 19, 2022 at 5:28 AM Jan Beulich <jbeul...@suse.com> wrote:
>>
>> On 16.09.2022 23:35, Boris Ostrovsky wrote:
>>>
>>> On 9/16/22 8:52 AM, Jan Beulich wrote:
>>>> On 15.09.2022 16:01, Tamas K Lengyel wrote:
>>>>> While experimenting with the vPMU subsystem an ASSERT failure was
>>>>> observed in vmx_find_msr because the vcpu_runnable state was true.
>>>>>
>>>>> The root cause of the bug appears to be the fact that the vPMU subsystem
>>>>> doesn't save its state on context_switch.

For the further reply below - is this actually true? What is the
vpmu_switch_from() (resolving to vpmu_save()) doing then early
in context_switch()?

>>>>> The vpmu_load function will attempt
>>>>> to gather the PMU state if its still loaded two different ways:
>>>>>      1. if the current pcpu is not where the vcpu ran before doing a 
>>>>> remote save
>>>>>      2. if the current pcpu had another vcpu active before doing a local 
>>>>> save
>>>>>
>>>>> However, in case the prev vcpu is being rescheduled on another pcpu its 
>>>>> state
>>>>> has already changed and vcpu_runnable is returning true, thus #2 will 
>>>>> trip the
>>>>> ASSERT. The only way to avoid this race condition is to make sure the
>>>>> prev vcpu is paused while being checked and its context saved. Once the 
>>>>> prev
>>>>> vcpu is resumed and does #1 it will find its state already saved.
>>>> While I consider this explanation plausible, I'm worried:
>>>>
>>>>> --- a/xen/arch/x86/cpu/vpmu.c
>>>>> +++ b/xen/arch/x86/cpu/vpmu.c
>>>>> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
>>>>>           vpmu = vcpu_vpmu(prev);
>>>>>
>>>>>           /* Someone ran here before us */
>>>>> +        vcpu_pause(prev);
>>>>>           vpmu_save_force(prev);
>>>>>           vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>>>>> +        vcpu_unpause(prev);
>>>>>
>>>>>           vpmu = vcpu_vpmu(v);
>>>>>       }
>>>> We're running with IRQs off here, yet vcpu_pause() waits for the vcpu
>>>> to actually be de-scheduled. Even with IRQs on this is already a
>>>> relatively heavy operation (also including its impact on the remote
>>>> side). Additionally the function is called from context_switch(), and
>>>> I'm unsure of the usability of vcpu_pause() on such a path. In
>>>> particular: Is there a risk of two CPUs doing this mutually to one
>>>> another? If so, is deadlocking excluded?
>>>>
>>>> Hence at the very least I think the description wants extending, to
>>>> discuss the safety of the change.
>>>>
>>>> Boris - any chance you could comment here? Iirc that's code you did
>>>> introduce.
>>>
>>>
>>> Is the assertion in vmx_find_msr() really needs to be for runnable vcpu or 
>>> can it be a check on whether vcpu is actually running (e.g. 
>>> RUNSTATE_running)?
>>
>> You cannot safely check for "running", as "runnable" may transition
>> to/from "running" behind your back.
> 
> The more I look at this the more I think the only sensible solution is
> to have the vPMU state be saved on vmexit for all vCPUs.

Do you really mean vmexit? It would suffice if state was reliably
saved during context-switch-out, wouldn't it? At that point the
vCPU can't be resumed on another pCPU, yet.

> That way all
> this having to figure out where and when a context needs saving during
> scheduling goes away. Yes, it adds a bit of overhead for cases where
> the vCPU will resume on the same pCPU and that context saved could
> have been skipped,

If you really mean vmexit, then I'm inclined to say that's more
than just "a bit of overhead". I'd agree if you really meant
context-switch-out, but as said further up it looks to me as if
that was already occurring. Apparently I'm overlooking something
crucial ...

Jan

> but it makes it so that the vCPU can be resumed on
> any pCPU without having to have evidently fragile checks that may
> potentially lead to deadlocks (TBH I don't know if that's a real
> concern at the moment because the current setup is very hard to reason
> about). We can still keep track if the context needs reloading from
> the saved context and skip that if we know the state is still active.
> Any objection to that change in light of these issues?
> 
> Tamas


Reply via email to