> -----Original Message----- > From: nick.a.mathew...@gmail.com [mailto:nick.a.mathew...@gmail.com] On > Behalf Of Nick Mathewson > Sent: Thursday, November 04, 2010 12:20 PM > To: Gilad Benjamini > Subject: Re: [Libevent-users] Read failures on Unix socket > > On Thu, Nov 4, 2010 at 2:09 PM, Gilad Benjamini > <gi...@altornetworks.com> wrote: > >> Hm. Looking at the epoll output in the "nonblocking" case, it > doesn't > >> look like Libevent is doing anything weird here: epoll_wait() is > >> honest-to-goodness saying "Okay to read on fd 15"... > >>
Looking again at the same output, I noticed something interesting (output below shows only those strace lines which seem relevant) - 22:55:21 dup(12) = 15 - 22:55:21 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLOUT, {u32=15, u64=15}}) = 0 - 22:55:21 epoll_wait(4, {{EPOLLOUT, {u32=15, u64=15}}}, 32, 865) = 1 - 22:55:21 epoll_ctl(4, EPOLL_CTL_MOD, 15, {EPOLLIN, {u32=15, u64=15}}) = 0 - 22:55:21 epoll_wait(4, {{EPOLLIN, {u32=15, u64=15}}, {EPOLLIN, {u32=14, u64=14}}, {EPOLLIN, {u32=13, u64=13}}}, 32, 812) = 3 - 22:55:21 close(15) = 0 - 22:55:21 epoll_ctl(4, EPOLL_CTL_DEL, 15, {EPOLLIN, {u32=15, u64=15}}) = -1 EBADF (Bad file descriptor) - 22:55:30 socket(PF_FILE, SOCK_DGRAM, 0) = 15 - 22:55:30 bind(15, {sa_family=AF_FILE, path="/var/log/snort/snort_alert"...}, 110) = 0 - 22:55:30 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=15, u64=15}}) = 0 - 22:55:51 epoll_wait(4, {{EPOLLIN, {u32=16, u64=16}}, {EPOLLIN, {u32=15, u64=15}}, {EPOLLIN, {u32=14, u64=14}}, {EPOLLIN, {u32=13, u64=13}}}, 32, 1000) = 4 - 22:55:51 recvfrom(15, - ... at this point the application hangs This lead me to suspect that perhaps the event I am getting on fd 15 actually belongs to the previous owner of 15 My first test was to replace "dup(fd)" with "dup2(fd,200+x) ; x++". Result: The problem with the UNIX socket disappeared. Instead, there was a similar problem with a socket which had a 200+ file descriptor; i.e. the problem has shifted to the duplicated descriptor Conclusion: Either epoll or epoll+libevent apparently deliver events to the wrong file descriptor My second test was triggered by another thing I saw. While my code deletes the event on the duplicated fd BEFORE closing the file descriptor, libevent actually deletes the descriptor from epoll at a later point. I understand there is some queuing mechanism involved. I tried "convincing" my code to do things in the right order, by deleting the event, and then setting a timer to close the file descriptor 10 miliseconds later. Result: The fd was deleted from the epoll set BEFORE it was closed, and my code seems to work perfectly. Conclusion: libevent's delayed delete doesn't combine well with epoll. Any chance of changing that ? I could think of an API to flush the queue, but that seems like a solution that involves the libevent user too much in the implementation. My apologies for the long mail. I'll be glad to hear your thoughts. *********************************************************************** To unsubscribe, send an e-mail to majord...@freehaven.net with unsubscribe libevent-users in the body.