On 30/05/2017 10:53, Roman Pen wrote: > The fix is obvious: use temporal queue and do not touch coroutine after > first qemu_coroutine_enter() is invoked. > > The issue is quite rare and happens every ~12 hours on very high IO load > (building linux kernel with -j512 inside guest) when IO throttling is > enabled. With the fix applied guest is running ~35 hours and is still > alive so far.
This is a good one. Thanks! Just a couple notes on English below. > diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c > index 6328eed26bc6..026f5297f0f9 100644 > --- a/util/qemu-coroutine-lock.c > +++ b/util/qemu-coroutine-lock.c > @@ -77,10 +77,23 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, > CoMutex *mutex) > void qemu_co_queue_run_restart(Coroutine *co) > { > Coroutine *next; > + QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup = > + QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup); > > trace_qemu_co_queue_run_restart(co); > - while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) { > - QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); > + > + /* > + * We use temporal queue in order not to touch 'co' after entering > + * 'next' coroutine. The thing is that 'next' coroutine can resume > + * current 'co' coroutine and eventually terminate (free) it (see > + * linux-aio.c: ioq_submit() where qemu_laio_process_completions() > + * is invoked). The rule of thumb is simple: do not touch coroutine > + * when you enter another one. > + */ /* Because "co" has yielded, any coroutine that we wakeup can resume it. * If this happens and "co" terminates, co->co_queue_wakeup becomes * invalid memory. Therefore, use a temporary queue and do not touch * the "co" coroutine as soon as you enter another one. */ > + QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup); > + > + while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) { > + QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next); > qemu_coroutine_enter(next); > } > } > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c > index 486af9a62275..d020b63742af 100644 > --- a/util/qemu-coroutine.c > +++ b/util/qemu-coroutine.c > @@ -126,6 +126,13 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine > *co) > > qemu_co_queue_run_restart(co); > > + /* > + * BEWARE: in case of ret == COROUTINE_YIELD here at this point > + * after qemu_co_queue_run_restart() 'co' can be already > + * freed by other coroutine, which has entered 'co'. So > + * be careful and do not touch it. > + */ /* Beware, if ret == COROUTINE_YIELD and qemu_co_queue_run_restart() * has started any other coroutine, "co" might have been reentered * and even freed by now! So be careful and do not touch it. */ > switch (ret) { > case COROUTINE_YIELD: > return; > Paolo