On 20.02.2021 20:47, Julien Grall wrote:
> From: Julien Grall <jgr...@amazon.com>
> 
> The comment in vcpu_block() states that the events should be checked
> /after/ blocking to avoids wakeup waiting race. However, from a generic
> perspective, set_bit() doesn't prevent re-ordering. So the following
> could happen:
> 
> CPU0  (blocking vCPU A)         |   CPU1 ( unblock vCPU A)
>                                 |
> A <- read local events          |
>                                 |   set local events
>                                 |   test_and_clear_bit(_VPF_blocked)
>                                 |       -> Bail out as the bit if not set
>                                 |
> set_bit(_VFP_blocked)           |
>                                 |
> check A                         |
> 
> The variable A will be 0 and therefore the vCPU will be blocked when it
> should continue running.
> 
> vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
> to read any information about local events before the flag _VPF_blocked
> is set.
> 
> Signed-off-by: Julien Grall <jgr...@amazon.com>

Reviewed-by: Jan Beulich <jbeul...@suse.com>

> ---
> 
> This is a follow-up of the discussion that started in 2019 (see [1])
> regarding a possible race between do_poll()/vcpu_unblock() and the wake
> up path.
> 
> I haven't yet fully thought about the potential race in do_poll(). If
> there is, then this would likely want to be fixed in a separate patch.
> 
> On x86, the current code is safe because set_bit() is fully ordered. SO
> the problem is Arm (and potentially any new architectures).
> 
> I couldn't convince myself whether the Arm implementation of
> local_events_need_delivery() contains enough barrier to prevent the
> re-ordering. However, I don't think we want to play with devil here as
> the function may be optimized in the future.

In fact I think this ...

> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -1418,6 +1418,8 @@ void vcpu_block(void)
>  
>      set_bit(_VPF_blocked, &v->pause_flags);
>  
> +    smp_mb__after_atomic();
> +

... pattern should be looked for throughout the codebase, and barriers
be added unless it can be proven none is needed.

Jan

Reply via email to