On Mon, Dec 12, 2016 at 2:34 PM, Eric Wong <e...@80x24.org> wrote: > +Cc folks who may know about timer stuff on epoll. > > Dmitry Banschikov <m...@ubique.spb.ru> wrote: >> Hi! >> >> I have a problem caused by inaccurate timeouts in epoll_wait(2). >> Here are some parts of strace -tt output: > > Which kernel version are you using?
I was able to reproduce this problem in 4.4.0(Ubuntu 16.04) and 3.10.0(CentOS 7.2.1511) on two different x86_64 machines. > >> 22578 09:33:46.959791 epoll_wait(5, <unfinished ...> >> 22578 09:33:50.010794 <... epoll_wait resumed> [], 128, 1498) = 0 >> ... >> 22034 09:35:07.686896 epoll_wait(5, <unfinished ...> >> 22034 09:35:09.482526 <... epoll_wait resumed> [{EPOLLIN, >> {u32=151458248, u64=151458248}}], 128, 362) = 1 >> ... >> 22036 09:35:41.433241 epoll_wait(5, <unfinished ...> >> 22036 09:35:43.176881 <... epoll_wait resumed> [], 128, 97) = 0 >> >> In each example epoll_wait is blocked for too longer then asked in timeout. >> >> Is it normal? > > I don't think so, unless you have a huge /proc/<pid>/timerslack_ns > set. But I mainly use -1 or 0 as the timeout value. /proc/<pid>/timerslack_ns interface was added in 4.6. For sure I can try to reproduce problem on fresh kernel if it can help debugging. I observe such epoll_wait behavior quite rarely - usually one/two/three times per hours of attempts to reproduce. And from strace output I can see that system is not in state of resource starvation - because other threads do some work between call and return to/from epoll_wait. Such timeout values for epoll_wait are generated by Boost ASIO library. Internally it uses priority queue for storing timer events and use timeout to nearest event from queue in epoll_wait. What information can help to debug this issue? > >> Please CC me. > > That's standard procedure, here :) -- Dmitry Banschikov