I've looked into this on ThunderX2. The arm64 code generated for the atomic_[add|sub] accesses of ctx->notify_me doesn't contain any memory barriers. It is just plain ldaxr/stlxr.
>From my understanding this is not sufficient for SMP sync. If I read this comment correct: void aio_notify(AioContext *ctx) { /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. */ smp_mb(); if (ctx->notify_me) { it points out that the smp_mb() should be paired. But as I said the used atomics don't generate any barriers at all. I've tried to verify me theory with this patch and didn't run into the issue for ~500 iterations (usually I would trigger the issue ~20 iterations). --Jan diff --git a/util/aio-posix.c b/util/aio-posix.c index d8f0cb4af8dd..d07dcd4e9993 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking) */ if (blocking) { atomic_add(&ctx->notify_me, 2); + smp_mb(); } qemu_lockcnt_inc(&ctx->list_lock); @@ -632,6 +633,7 @@ bool aio_poll(AioContext *ctx, bool blocking) if (blocking) { atomic_sub(&ctx->notify_me, 2); + smp_mb(); } /* Adjust polling time */ diff --git a/util/async.c b/util/async.c index 4dd9d95a9e73..92ac209c4615 100644 --- a/util/async.c +++ b/util/async.c @@ -222,6 +222,7 @@ aio_ctx_prepare(GSource *source, gint *timeout) AioContext *ctx = (AioContext *) source; atomic_or(&ctx->notify_me, 1); + smp_mb(); /* We assume there is no timeout already supplied */ *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)); @@ -240,6 +241,7 @@ aio_ctx_check(GSource *source) QEMUBH *bh; atomic_and(&ctx->notify_me, ~1); + smp_mb(); aio_notify_accept(ctx); for (bh = ctx->first_bh; bh; bh = bh->next) {