Hi,

Our software heavily relies on epoll based wait/event loops, using mixes of 
"real" FDs (mostly sockets), eventfd and timerfds, and after upgrading to Nuttx 
12.2.1 I have some oddities in behaviour that I think can be traced to the 
optimizations in 
https://github.com/apache/nuttx/commit/25bfd437fe5b30d0221c68288e27fb7efc57fe0c


The behaviour can easily be triggered by creating an epoll fd, adding two 
timerfd with same timeout to it and then doing a loop with epoll_wait. First 
iteration (on my board) epoll_wait will return 2, with 2 events having EPOLLIN 
set, as expected, however second time epoll_wait is called (with infinite 
timeout) it returns immediately, returning 0 (zero) meaning no ready fds which 
as far as I can read the spec of epoll_wait (in the linux kernel) is an invalid 
return for an infinite wait.


Reading the code and trying to debug using prints (without affecting the 
timing) I think the following is what happens:


  1.  epoll_wait blocks on nxsem_wait, waiting for the semaphore to be 
signaled. Semcount is decreased so it becomes -1.
  2.  timerfd1 expires, timerfd_timeout calls poll_notify which in turn signals 
the semaphore. Semcount increased by 1 so it becomes 0.
  3.  timerfd2 expires, timerfd_timeout calls poll_notify which in turn signals 
the semaphore again. Semcount increased by 1 so it becomes 1.
  4.  epoll_wait is woken up, processed the fds, copies revent etc to the 
output indicating (correctly) that both fds have been affected. epoll_wait 
returns 2.
  5.  epoll_wait is called again. nxsem_wait is called, but as semcount is 
already 1 it returns immediately so epoll_wait does not go to sleep. When 
processing fds all revents is now still zero, so epoll_wait returns 0 (as in 
fact there is nothing ready to process).

It is possible that the epoll_wait wakeup happens before timerfd2 expiration, 
however timerfd2 expiration definitley happens before epoll_teardown processes 
all the fds.

I am not sure about the correct way to handle this, and the setup/teardown 
optimizations in themselves seems good and desirable. Could someone who was 
involved in writing or reviewing the original patch maybe comment if my 
analysis seems wrong of if you have any idea how to solve this?

Regards
Marten Svanfeldt

Reply via email to