Am 24.10.2019 um 15:03 hat Vladimir Sementsov-Ogievskiy geschrieben: > 24.10.2019 13:57, Kevin Wolf wrote: > > Am 24.10.2019 um 12:01 hat Denis Lunev geschrieben: > >> On 10/23/19 6:26 PM, Kevin Wolf wrote: > >>> qcow2_cache_do_get() requires that s->lock is locked because it can > >>> yield between picking a cache entry and actually taking ownership of it > >>> by setting offset and increasing the reference count. > >>> > >>> Add an assertion to make sure the caller really holds the lock. The > >>> function can be called outside of coroutine context, where bdrv_pread > >>> and flushes become synchronous operations. The lock cannot and need not > >>> be taken in this case. > >>> > >>> Signed-off-by: Kevin Wolf <kw...@redhat.com> > >>> --- > >>> block/qcow2-cache.c | 5 +++++ > >>> 1 file changed, 5 insertions(+) > >>> > >>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c > >>> index d29b038a67..75b13dad99 100644 > >>> --- a/block/qcow2-cache.c > >>> +++ b/block/qcow2-cache.c > >>> @@ -327,6 +327,9 @@ static int qcow2_cache_do_get(BlockDriverState *bs, > >>> Qcow2Cache *c, > >>> int min_lru_index = -1; > >>> > >>> assert(offset != 0); > >>> + if (qemu_in_coroutine()) { > >>> + qemu_co_mutex_assert_locked(&s->lock); > >>> + } > >> > >> that is looking not good to me. If this is really requires lock, we should > >> check for the lock always. In the other hand we could face missed > >> lock out of coroutine. > > > > As the commit message explains, outside of coroutine context, we can't > > yield and bdrv_pread and bdrv_flush become synchronous operations > > instead, so there is nothing else that we need to protect against. > > > > Recently we discussed similar problems about block-dirty-bitmap-* qmp > commands, which wanted to update qcow2 metadata about bitmaps from > non-coroutine context. "qcow2 lock" > <135df452-397a-30bb-7518-2184fa597...@virtuozzo.com> > https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg01419.html
Hm, right, I already forgot about the nested event loop again... > And, as I understand, the correct way is to convert to coroutine and > lock mutex appropriately. Or we want to lock aio context and to be in > drained section to avoid parallel requests accessing critical section. > Should we assert such conditions in case of !qemu_in_coroutine() ? The AioContext lock must be held anyway, so I don't think this would be a new requirement. As for draining, I'll have to see. I'm currently still auditing all the callers of qcow2_cache_do_get(). The synchronous callers I already know are the snapshot functions. I think these happen to be in a drained section anyway (or should be at least), so AioContext lock + drain seems like a very reasonable option for them. For other synchronous callers, if any, maybe conversion to a coroutine would make more sense. Kevin