events: fix race with set_global_virq_handler()

Juergen Gross Tue, 07 Jan 2025 07:57:37 -0800

On 07.01.25 16:49, Jan Beulich wrote:

On 07.01.2025 16:37, Juergen Gross wrote:

On 07.01.25 16:23, Jan Beulich wrote:

On 07.01.2025 11:17, Juergen Gross wrote:

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -979,6 +979,7 @@ void send_global_virq(uint32_t virq)
   int set_global_virq_handler(struct domain *d, uint32_t virq)
   {
       struct domain *old;
+    int rc = 0;

if (virq >= NR_VIRQS)

           return -EINVAL;
@@ -992,14 +993,23 @@ int set_global_virq_handler(struct domain *d, uint32_t 
virq)
           return -EINVAL;

spin_lock(&global_virq_handlers_lock);

-    old = global_virq_handlers[virq];
-    global_virq_handlers[virq] = d;
+
+    if ( d->is_dying != DOMDYING_alive )
+    {
+        old = d;
+        rc = -EINVAL;
+    }


While I can see how this eliminates the zombie domain aspect, this doesn't
fully eliminate the race. Doing so would require (also) using the domain's
event lock. Assuming we're okay with the remaining race, imo a code comment
would be needed to state this (including the fact that it's then
unpredictable whether this operation might still succeed for a domain
already having d->is_dying != DOMDYING_alive).


AFAIU you mean that it is still possible to set a domain to handle a virq
when it is in the process of going down, especially if is_dying is set just
after it has been tested to be DOMDYING_alive?

I don't see this being a problem, as the same would happen if the domain
would go down just a millisecond later. This is something we will never be
able to handle.


Right, but the sequence of events in the case you mention is different: The
insertion into the array would still happen while the domain isn't marked
dying.

And after all the call of clear_global_virq_handlers() will now reset the
handling domain to the hardware domain in all cases.


Of course, but in the meantime an event may be sent to such a domain already
marked dying. That likely isn't going to cause problems, but is unexpected
with what description here says is being addressed.

Plus the way you do it the early success path remains; ideally that case
would also fail for an already dying domain.


Same again: clear_global_virq_handlers() will reset the handling domain.


Right.

In summary: As indicated, we may be okay with the remaining race, but then
we also should be making clear that we've decided to leave it at that.
Hence my earlier request: If we accept this, say (and briefly justify) this
in a code comment.


Okay, would you be fine with:

  Note that this check won't guarantee that a domain just going down can't be
  set as the handling domain of a virq, as the is_dying indicator might change
  just after testing it.
  This isn't going to be a major problem, as clear_global_virq_handlers() is
  guaranteed to run afterwards and it will reset the handling domain for the
  virq to the hardware domain.


Juergen

OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH v6 1/7] xen/events: fix race with set_global_virq_handler()

Reply via email to