On Thu, Apr 08, 2021 at 05:08:20PM +0300, Vladimir Sementsov-Ogievskiy wrote: > With the following patch we want to call aio_co_wake() from thread. > And it works bad. > Assume we have no iothreads. > Assume we have a coroutine A, which waits in the yield point for external > aio_co_wake(), and no progress can be done until it happen. > Main thread is in blocking aio_poll() (for example, in blk_read()). > > Now, in a separate thread we do aio_co_wake(). It calls aio_co_enter(), > which goes through last "else" branch and do aio_context_acquire(ctx). > > Now we have a deadlock, as aio_poll() will not release the context lock > until some progress is done, and progress can't be done until > aio_co_wake() wake the coroutine A. And it can't because it wait for > aio_context_acquire(). > > Still, aio_co_schedule() works well in parallel with blocking > aio_poll(). So let's use it in generic case and drop > aio_context_acquire/aio_context_release branch from aio_co_enter(). > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > --- > util/async.c | 11 ++--------- > 1 file changed, 2 insertions(+), 9 deletions(-) > > diff --git a/util/async.c b/util/async.c > index 674dbefb7c..f05b883a39 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -614,19 +614,12 @@ void aio_co_wake(struct Coroutine *co) > > void aio_co_enter(AioContext *ctx, struct Coroutine *co) > { > - if (ctx != qemu_get_current_aio_context()) { > - aio_co_schedule(ctx, co); > - return; > - } > - > - if (qemu_in_coroutine()) { > + if (ctx == qemu_get_current_aio_context() && qemu_in_coroutine()) { > Coroutine *self = qemu_coroutine_self(); > assert(self != co); > QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next); > } else { > - aio_context_acquire(ctx); > - qemu_aio_coroutine_enter(ctx, co); > - aio_context_release(ctx); > + aio_co_schedule(ctx, co); > } > }
I'm fine with the change, but I find the log message to be a bit confusing (although correct). AFAICS the problem is that calling aio_co_enter from a thread which has no associated aio_context works differently compared to calling it from a proper iothread: if the target context was qemu_aio_context, an iothread would just schedule the coroutine there, while a "dumb" thread would try lock the context potentially resulting in a deadlock. This patch makes "dumb" threads and iothreads behave identically when entering a coroutine on a foreign context. You may want to rephrase the log message to that end. Anyway Reviewed-by: Roman Kagan <rvka...@yandex-team.ru>