Am 31/03/2022 um 11:59 schrieb Paolo Bonzini:
> On 3/30/22 18:02, Paolo Bonzini wrote:
>> On 3/30/22 12:53, Hanna Reitz wrote:
>>>>
>>>> Seems a good compromise between drains and rwlock. What do you think?
>>>
>>> Well, sounds complicated. So I’m asking myself whether this would be
>>> noticeably better than just an RwLock for graph modifications, like
>>> the global lock Vladimir has proposed.
>
> [try again, this time with brain connected]
>
> A global lock would have to be taken by all iothreads on every I/O
> operation, even a CoRwLock would not scale because it has a global
> CoMutex inside and rdlock/unlock both take it. Even if the critical
> section is small, the cacheline bumping would be pretty bad.
>
> Perhaps we could reuse the code in cpus-common.c, which relies on a list
> of possible readers and is quite cheap (two memory barriers basically)
> for readers. Here we would have a list of AioContexts (or perhaps
> BlockDriverStates?) as the possible readers.
>
> The slow path uses a QemuMutex, a QemuCond for the readers and a
> QemuCond for the writers. The reader QemuCond can be replaced by a
> CoQueue, I think.
>
It would be something like this:
Essentially move graph_bdrv_states as public, so that we can iterate
through all bs as cpu-common.c does with cpus, and then duplicate the
already existing API in cpu-common.c:
bdrv_graph_list_wrlock <-> start_exclusive
bdrv_graph_list_wrunlock <-> end_exclusive
bdrv_graph_list_rdlock <-> cpu_exec_start
bdrv_graph_list_rdunlock <-> cpu_exec_end
The only difference is that there is no qemu_cpu_kick(), but I don't
think it matters.
What do you think?
diff --git a/block.c b/block.c
index 718e4cae8b..af8dd37101 100644
--- a/block.c
+++ b/block.c
@@ -52,6 +52,7 @@
#include "qemu/range.h"
#include "qemu/rcu.h"
#include "block/coroutines.h"
+#include "block/block-list.h"
#ifdef CONFIG_BSD
#include <sys/ioctl.h>
@@ -67,10 +68,6 @@
#define NOT_DONE 0x7fffffff /* used while emulated sync operation in
progress */
-/* Protected by BQL */
-static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
- QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
-
/* Protected by BQL */
static QTAILQ_HEAD(, BlockDriverState) all_bdrv_states =
QTAILQ_HEAD_INITIALIZER(all_bdrv_states);
diff --git a/include/block/block-list.h b/include/block/block-list.h
new file mode 100644
index 0000000000..d55a056938
--- /dev/null
+++ b/include/block/block-list.h
@@ -0,0 +1,54 @@
+#ifndef BLOCK_LIST_H
+#define BLOCK_LIST_H
+
+#include "qemu/osdep.h"
+
+
+/* Protected by BQL */
+QTAILQ_HEAD(,BlockDriverState) graph_bdrv_states =
QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
+
+void bdrv_init_graph_list_lock(void);
+
+/*
+ * bdrv_graph_list_wrlock:
+ * Modify the graph. Nobody else is allowed to access the graph.
+ * Set pending op >= 1, so that the next readers will wait in a
coroutine queue.
+ * Then keep track of the running readers using bs->has_writer,
+ * and wait until they finish.
+ */
+void bdrv_graph_list_wrlock(void);
+
+/*
+ * bdrv_graph_list_wrunlock:
+ * Write finished, reset pending operations to 0 and restart
+ * all readers that are waiting.
+ */
+void bdrv_graph_list_wrunlock(void);
+
+/*
+ * bdrv_graph_list_rdlock:
+ * Read the bs graph. Set bs->reading_graph true, and if there are pending
+ * operations, it means that the main loop is modifying the graph,
+ * therefore wait in exclusive_resume coroutine queue.
+ * If this read is coming while the writer has already marked the
+ * running read requests and it's waiting for them to finish,
+ * wait that the write finishes first.
+ * Main loop will then wake this coroutine once it is done.
+ */
+void bdrv_graph_list_rdlock(BlockDriverState *bs);
+
+/*
+ * bdrv_graph_list_rdunlock:
+ * Read terminated, decrease the count of pending operations.
+ * If it's the last read that the writer is waiting for, signal
+ * the writer that we are done and let it continue.
+ */
+void bdrv_graph_list_rdunlock(BlockDriverState *bs);
+
+
+
+
+#define BS_GRAPH_READER(bs) /* in main loop OR bs->reading_graph */
+#define BS_GRAPH_WRITER(bs) /* in main loop AND bs->bs_graph_pending_op
> 0 */
+#endif /* BLOCK_LIST_H */
+
diff --git a/include/block/block_int-common.h
b/include/block/block_int-common.h
index 5a04c778e4..0266876a59 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -985,6 +985,20 @@ struct BlockDriverState {
bool force_share; /* if true, always allow all shared permissions */
bool implicit; /* if true, this filter node was automatically
inserted */
+ /*
+ * If true, the bs is reading the graph.
+ * No write should happen in the meanwhile.
+ * Atomic.
+ */
+ bool reading_graph;
+
+ /*
+ * If true, the main loop is modifying the graph.
+ * bs cannot read the graph.
+ * Protected by bs_graph_list_lock.
+ */
+ bool has_writer;
+
BlockDriver *drv; /* NULL means no media */
void *opaque;