On 18/02/20 19:27, Stefan Hajnoczi wrote: > The first rcu_read_lock/unlock() is expensive. Nested calls are cheap. > > This optimization increases IOPS from 73k to 162k with a Linux guest > that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 > devices. > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > --- > util/aio-posix.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/util/aio-posix.c b/util/aio-posix.c > index a4977f538e..f67f5b34e9 100644 > --- a/util/aio-posix.c > +++ b/util/aio-posix.c > @@ -15,6 +15,7 @@ > > #include "qemu/osdep.h" > #include "block/block.h" > +#include "qemu/rcu.h" > #include "qemu/rcu_queue.h" > #include "qemu/sockets.h" > #include "qemu/cutils.h" > @@ -514,6 +515,16 @@ static bool run_poll_handlers_once(AioContext *ctx, > int64_t *timeout) > bool progress = false; > AioHandler *node; > > + /* > + * Optimization: ->io_poll() handlers often contain RCU read critical > + * sections and we therefore see many rcu_read_lock() -> > rcu_read_unlock() > + * -> rcu_read_lock() -> ... sequences with expensive memory > + * synchronization primitives. Make the entire polling loop an RCU > + * critical section because nested rcu_read_lock()/rcu_read_unlock() > calls > + * are cheap. > + */ > + RCU_READ_LOCK_GUARD(); > + > QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { > if (!node->deleted && node->io_poll && > aio_node_check(ctx, node->is_external) && >
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>