Am 24/05/2022 um 14:10 schrieb Kevin Wolf:
> Am 18.05.2022 um 14:28 hat Emanuele Giuseppe Esposito geschrieben:
>> label: // read till the end to see why I wrote this here
>>
>> I was hoping someone from the "No" party would answer to your question,
>> because as you know we reached this same conclusion together.
>>
>> We thought about dropping the drain for various reasons, the main one
>> (at least as far as I understood) is that we are not sure if something
>> can still happen in between drain_begin/end, and it is a little bit
>> confusing to use the same mechanism to block I/O and protect the graph.
>>
>> We then thought about implementing a rwlock. A rdlock would clarify what
>> we are protecting and who is using the lock. I had a rwlock draft
>> implementation sent in this thread, but this also lead to additional
>> problems.
>> Main problem was that this new lock would introduce nested event loops,
>> that together with such locking would just create deadlocks.
>> If readers are in coroutines and writers are not (because graph
>> operations are not running in coroutines), we have a lot of deadlocks.
>> If a writer has to take the lock, it must wait all other readers to
>> finish. And it does it by internally calling AIO_WAIT_WHILE, creating
>> nested event loop. We don't know what could execute when polling for
>> events, and for example another writer could be resumed.
>
> Why is this a problem? Your AIO_WAIT_WHILE() condition would be that
> there are neither readers nor writers, so you would just keep waiting
> until the other writer is done.
Yes, but when we get to the AIO_WAIT_WHILE() condition the wrlock is
already being taken in the current writer.
I think that's what you also mean below.
>
>> Ideally, we want writers in coroutines too.
>>
>> Additionally, many readers are running in what we call "mixed"
>> functions: usually implemented automatically with generated_co_wrapper
>> tag, they let a function (usually bdrv callback) run always in a
>> coroutine, creating one if necessary. For example, bdrv_flush() makes
>> sure hat bs->bdrv_co_flush() is always run in a coroutine.
>> Such mixed functions are used in other callbacks too, making it really
>> difficult to understand if we are in a coroutine or not, and mostly
>> important make rwlock usage very difficult.
>
> How do they make rwlock usage difficult?
>
> *goes back to old IRC discussions*
>
> Ah, the problem was not the AIO_WAIT_WHILE() while taking the lock, but
> taking the lock first and then running an AIO_WAIT_WHILE() inside the
> locked section. This is forbidden because the callbacks that run during
> AIO_WAIT_WHILE() may in turn wait for the lock that you own, causing a
> deadlock.
>
Yes
> This is indeed a problem that running in coroutines would avoid because
> the inner waiter would just yield and the outer one could complete its
> job as soon as it's its turn.
>
> My conclusion in the IRC discussion was that maybe we need to take the
> graph locks when we're entering coroutine context, i.e. the "mixed"
> functions would rdlock the graph when called from non-coroutine context
> and would assume that it's already locked when called from coroutine
> context.
Yes, and that's what I tried to do.
But the first step was to transform all callbacks as coroutines. I think
you also agree with this, correct?
And therefore the easiest step was to convert all callbacks in
generated_co_wrapper functions, so that afterwards we could split them
between coroutine and not-coroutine logic, as discussed on IRC.
Once split, we add the lock in the way you suggested.
However, I didn't even get to the first step part, because tests were
deadlocking after just transforming 2-3 callbacks.
See Paolo thread for a nice explanation on why they are deadlocking and
converting these callbacks is difficult.
>
>> Which lead us to stepping back once more and try to convert all
>> BlockDriverState callbacks in coroutines. This would greatly simplify
>> rwlock usage, because we could make the rwlock coroutine-frendly
>> (without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
>> just yielding and queuing itself in coroutine queues).
>>
>> First step was then to convert all callbacks in coroutines, using
>> generated_coroutine_wrapper (g_c_w).
>> A typical g_c_w is implemented in this way:
>> if (qemu_in_coroutine()) {
>> callback();
>> } else { // much simplified
>> co = qemu_coroutine_create(callback);
>> bdrv_coroutine_enter(bs, co);
>> BDRV_POLL_WHILE(bs, coroutine_in_progress);
>> }
>> Once all callbacks are implemented using g_c_w, we can start splitting
>> the two sides of the if function to only create a coroutine when we are
>> outside from a bdrv callback.
>>
>> However, we immediately found a problem while starting to convert the
>> first callbacks: the AioContext lock is taken around some non coroutine
>> callbacks! For example, bs->bdrv_open() is always called with the
>> AioContext lock taken. In addition, callbacks like bdrv_open are
>> graph-modifying functions, which is probably why we are taking the
>> Aiocontext lock, and they do not like to run in coroutines.
>> Anyways, the real problem comes when we create a coroutine in such
>> places where the AioContext lock is taken and we have a graph-modifying
>> function.
>>
>> bdrv_coroutine_enter() calls aio_co_enter(), which in turns first checks
>> if the coroutine is entering another context from the current (which is
>> not the case for open) and if we are already in coroutine (for sure
>> not). Therefore it resorts to the following calls;
>> aio_context_acquire(ctx);
>> qemu_aio_coroutine_enter(ctx, co);
>> aio_context_release(ctx);
>> Which is clearly a problem, because we are taking the lock twice: once
>> from the original caller of the callback, and once here due to the
>> coroutine. This creates a lot of deadlock situations.
>
> What are the deadlock situations that are created by locking twice?
Scratch this, and refer to Paolo's thread.
>
> The only problem I'm aware of is AIO_WAIT_WHILE(), which wants to
> temporarily unlock the AioContext It calls aio_context_release() once to
> achieve this, which obviously isn't enough when the context was locked
> twice.
>
> But AIO_WAIT_WHILE() isn't allowed in coroutines anyway. So how are we
> running into deadlocks here?
>
> Note that we're probably already doing this inside the .bdrv_open
> implementations: They will ususally read something from the image file,
> calling bdrv_preadv() which is already a generated_coroutine_wrapper
> today and creates a coroutine internally with the same locking pattern
> applied that you describe as problematic here.
>
> Making .bdrv_open itself a generated_coroutine_wrapper wouldn't really
> change anything fundamental, it would just pull the existing mechanism
> one function higher in the call stack.
>
>> For example, all callers of bdrv_open() always take the AioContext lock.
>> Often it is taken very high in the call stack, but it's always taken.
>>
>> Getting rid of the lock around qemu_aio_coroutine_enter() is difficult
>> too, because coroutines expect to have the lock taken. For example, if
>> we want to drain from a coroutine, bdrv_co_yield_to_drain releases the
>> lock for us.
>
> It's not difficult at all in your case where you know that you're
> already in the right thread and the lock is taken: You can call
> qemu_aio_coroutine_enter() directly instead of bdrv_coroutine_enter() in
> this case.
>
> But as I said, I'm not sure why we need to get rid of it at all.
>
> Kevin
>