On Tue, Jun 30, 2015 at 09:19:45PM +0800, Fam Zheng wrote: > ===================================================================== > # of scsi-disks | master | epoll > | rd wr randrw | rd wr randrw > --------------------------------------------------------------------- > 1 | 103 96 49 | 105 99 49 > 4 | 92 96 48 | 103 98 49 > 8 | 96 94 46 | 101 97 50 > 16 | 91 91 45 | 101 95 48 > 32 | 84 83 40 | 95 95 48 > 64 | 75 73 35 | 91 90 44 > 128 | 54 53 26 | 79 80 39 > 256 | 41 39 19 | 63 62 30 > =====================================================================
Nice results! > @@ -44,6 +47,12 @@ static AioHandler *find_aio_handler(AioContext *ctx, int > fd) > > void aio_context_setup(AioContext *ctx, Error **errp) > { > +#ifdef CONFIG_EPOLL > + ctx->epollfd = epoll_create1(EPOLL_CLOEXEC); > + if (ctx->epollfd < 0) { > + error_setg(errp, "Failed to create epoll fd: %s", strerror(errno)); Slightly more concise: error_setg_errno(errp, errno, "Failed to create epoll fd") > -/* These thread-local variables are used only in a small part of aio_poll > +#ifdef CONFIG_EPOLL > +QEMU_BUILD_BUG_ON((int)G_IO_IN != EPOLLIN); > +QEMU_BUILD_BUG_ON((int)G_IO_OUT != EPOLLOUT); > +QEMU_BUILD_BUG_ON((int)G_IO_PRI != EPOLLPRI); > +QEMU_BUILD_BUG_ON((int)G_IO_ERR != EPOLLERR); > +QEMU_BUILD_BUG_ON((int)G_IO_HUP != EPOLLHUP); I guess this assumption is okay but maybe the compiler optimizes: event.events = (node->pfd.events & G_IO_IN ? EPOLLIN : 0) | (node->pfd.events & G_IO_OUT ? EPOLLOUT : 0) | (node->pfd.events & G_IO_PRI ? EPOLLPRI : 0) | (node->pfd.events & G_IO_ERR ? EPOLLERR : 0) | (node->pfd.events & G_IO_HUP ? EPOLLHUP : 0); into: events.events = node->pfd.events & (EPOLLIN | EPOLLOUT | EPOLLPRI | EPOLLERR | EPOLLHUP); which is just an AND instruction so it's effectively free and doesn't assume that these constants have the same values. > + > +#define EPOLL_BATCH 128 > +static bool aio_poll_epoll(AioContext *ctx, bool blocking) > +{ > + AioHandler *node; > + bool was_dispatching; > + int i, ret; > + bool progress; > + int64_t timeout; > + struct epoll_event events[EPOLL_BATCH]; > + > + aio_context_acquire(ctx); > + was_dispatching = ctx->dispatching; > + progress = false; > + > + /* aio_notify can avoid the expensive event_notifier_set if > + * everything (file descriptors, bottom halves, timers) will > + * be re-evaluated before the next blocking poll(). This is > + * already true when aio_poll is called with blocking == false; > + * if blocking == true, it is only true after poll() returns. > + * > + * If we're in a nested event loop, ctx->dispatching might be true. > + * In that case we can restore it just before returning, but we > + * have to clear it now. > + */ > + aio_set_dispatching(ctx, !blocking); > + > + ctx->walking_handlers++; > + > + timeout = blocking ? aio_compute_timeout(ctx) : 0; > + > + if (timeout > 0) { > + timeout = DIV_ROUND_UP(timeout, 1000000); > + } I think you already posted the timerfd code in an earlier series. Why degrade to millisecond precision? It needs to be fixed up anyway if the main loop uses aio_poll() in the future.
pgpXVLJUm6kFI.pgp
Description: PGP signature