On Wed, 07/08 11:58, Stefan Hajnoczi wrote: > On Wed, Jul 08, 2015 at 09:01:27AM +0800, Fam Zheng wrote: > > On Tue, 07/07 16:08, Stefan Hajnoczi wrote: > > > > +#define EPOLL_BATCH 128 > > > > +static bool aio_poll_epoll(AioContext *ctx, bool blocking) > > > > +{ > > > > + AioHandler *node; > > > > + bool was_dispatching; > > > > + int i, ret; > > > > + bool progress; > > > > + int64_t timeout; > > > > + struct epoll_event events[EPOLL_BATCH]; > > > > + > > > > + aio_context_acquire(ctx); > > > > + was_dispatching = ctx->dispatching; > > > > + progress = false; > > > > + > > > > + /* aio_notify can avoid the expensive event_notifier_set if > > > > + * everything (file descriptors, bottom halves, timers) will > > > > + * be re-evaluated before the next blocking poll(). This is > > > > + * already true when aio_poll is called with blocking == false; > > > > + * if blocking == true, it is only true after poll() returns. > > > > + * > > > > + * If we're in a nested event loop, ctx->dispatching might be true. > > > > + * In that case we can restore it just before returning, but we > > > > + * have to clear it now. > > > > + */ > > > > + aio_set_dispatching(ctx, !blocking); > > > > + > > > > + ctx->walking_handlers++; > > > > + > > > > + timeout = blocking ? aio_compute_timeout(ctx) : 0; > > > > + > > > > + if (timeout > 0) { > > > > + timeout = DIV_ROUND_UP(timeout, 1000000); > > > > + } > > > > > > I think you already posted the timerfd code in an earlier series. Why > > > degrade to millisecond precision? It needs to be fixed up anyway if the > > > main loop uses aio_poll() in the future. > > > > Because of a little complication: timeout here is always -1 for iothread, > > and > > what is interesting is that -1 actually requires an explicit > > > > timerfd_settime(timerfd, flags, &(struct itimerspec){{0, 0}}, NULL) > > > > to disable timerfd for this aio_poll(), which costs somethings. Passing -1 > > to > > epoll_wait() without this doesn't work because the timerfd is already added > > to > > the epollfd and may have an unexpected timeout set before. > > > > Of course we can cache the state and optimize, but I've not reasoned about > > what > > if another thread happens to call aio_poll() when we're in epoll_wait(), for > > example when the first aio_poll() has a positive timeout but the second one > > has > > -1. > > I'm not sure I understand the threads scenario since aio_poll_epoll() > has a big aio_context_acquire()/release() region that protects it, but I > guess the nested aio_poll() case is similar. Care needs to be taken so > the extra timerfd state is consistent.
Nested aio_poll() has no racing on timerfd because the outer aio_poll()'s epoll_wait() would have already returned at the point of the inner aio_poll(). Threads are different with Paolo's "release AioContext around blocking aio_poll()". > > The optimization can be added later unless the timerfd_settime() syscall > is so expensive that it defeats the advantage of epoll(). That's the plan, and must be done before it get used by main loop. Fam