mshv_partition_ioctl_create_vp() initialises a VP struct (allocations,
mutex_init, init_waitqueue_head, page mappings) and then publishes the
pointer into partition->pt_vp_array. Several ISR paths read this array
locklessly: the intercept ISR, the two scheduler ISRs, and
mshv_try_assert_irq_fast() on the irqfd fast path.
Of these, only mshv_try_assert_irq_fast() can structurally race the
publish. It runs from an eventfd waker without holding pt_mutex, and
MSHV_IRQFD does not require the target lapic_apic_id (== vp_index) to
refer to an existing VP at registration time. A user can therefore
register an irqfd targeting a yet-to-be-created VP, then trigger
mshv_try_assert_irq_fast() concurrently with MSHV_CREATE_VP for the
same index. On weakly-ordered architectures the reader can observe a
non-NULL pointer in pt_vp_array before the initialising stores to the
VP struct become visible, leading to use of partially-initialised
fields (e.g. vp_register_page).
The other ISR readers cannot reach this race: the hypervisor will not
generate intercept or scheduler messages for a VP that has never been
told to run, and the user can only call MSHV_RUN_VP on the VP fd
returned by MSHV_CREATE_VP, which by construction is returned after
the publish. Leave those readers as plain loads.
Use smp_store_release() in mshv_partition_ioctl_create_vp() to publish
the pointer, and pair it with smp_load_acquire() in
mshv_try_assert_irq_fast(). On x86 these compile to plain accesses
under TSO; on ARM64 they emit one-instruction acquire/release barriers,
acceptable on this fast path.
The destroy-side path (destroy_partition() clearing pt_vp_array[i] to
NULL after kfree(vp)) has a separate ordering and lifetime concern
that is out of scope here.
Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose
/dev/mshv to VMMs")
Signed-off-by: Stanislav Kinsburskii <[email protected]>
---
drivers/hv/mshv_eventfd.c | 9 ++++++++-
drivers/hv/mshv_root_main.c | 8 +++++++-
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 11a6006f80194..5f0dd243e1445 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -197,7 +197,14 @@ static int mshv_try_assert_irq_fast(struct mshv_irqfd
*irqfd,
if (irq->lapic_apic_id >= MSHV_MAX_VPS)
return -EINVAL;
- vp = partition->pt_vp_array[irq->lapic_apic_id];
+ /*
+ * Pairs with smp_store_release() in mshv_partition_ioctl_create_vp().
+ * MSHV_IRQFD does not require the target lapic_apic_id to refer to an
+ * existing VP, so this read can race a concurrent VP creation; the
+ * acquire ensures that a non-NULL pointer implies the VP's
+ * initialising stores are visible.
+ */
+ vp = smp_load_acquire(&partition->pt_vp_array[irq->lapic_apic_id]);
if (!vp)
return -EINVAL;
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 7e4252b6bc65c..381aa86c5b90e 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1224,7 +1224,13 @@ mshv_partition_ioctl_create_vp(struct mshv_partition
*partition,
/* already exclusive with the partition mutex for all ioctls */
partition->pt_vp_count++;
- partition->pt_vp_array[args.vp_index] = vp;
+ /*
+ * Pairs with smp_load_acquire() in mshv_try_assert_irq_fast(), which
+ * can run concurrently from an irqfd waker without holding pt_mutex.
+ * The release ensures the VP's initialising stores are visible to any
+ * reader that observes a non-NULL pointer in pt_vp_array.
+ */
+ smp_store_release(&partition->pt_vp_array[args.vp_index], vp);
goto out;