I believe I have found a bug in epoll. This bug causes the behavior I described 
in earlier emails. The bug is caused by the interaction of epoll instances 
which share no files in common. 

I wrote a C program that behaves similar to my original program and triggers 
the bug. The bug only arises when I use enough cores and threads (about 16). 
The program is here: 
https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c  This program 
is a super-stripped down http server.  It uses a number of threads that serve 
requests, each with its own epoll instance. There is also a "wakeup" thread 
that simply monitors an eventfd file and reads from the eventfd file when 
woken. All the worker threads write to the eventfd file when they process a 
request. This probably seems like a strange program, but something like this 
came up in a real system. 

I test the program using the weighttp http request generator 
(http://redmine.lighttpd.net/projects/weighttp/wiki). You need to test with 
enough requests and enough concurrent clients, and enough worker threads to 
create the problem. For example, I run with './weighttp -n 400000 -c 500 -t 6 
-k "10.12.0.1:8080"'. With 16 cores for the server program (epollbug.c) this 
test workload triggers the bug about once every 3 runs.  The server 
(epollbug.c) has been hardcoded to work with whatever specific request weighttp 
sends it.  You need to find out what weighttp is sending from your test machine 
and then put that at the top of epollbug.c. You will see where it goes. You can 
uncomment the SHOW_DEBUG flag at the top of the program and run weighttp 
against it and it will print the request weighttp is sending. Then update the 
EXPECTED_HTTP_REQUEST with whatever you get.

I am running Linux 3.4.0.0. 

Cheers, 
Andi

On Dec 13, 2012, at 10:29 AM, Andreas Voellmy <andreas.voel...@yale.edu> wrote:

> Hi Eric, 
> 
> On Dec 13, 2012, at 4:32 AM, Eric Wong <normalper...@yhbt.net> wrote:
> 
>> Andreas Voellmy <andreas.voel...@yale.edu> wrote:
>> 
>>>> Another thread, distinct from all of the threads serving particular
>>>> sockets, is perfoming epoll_wait calls. When sockets are returned as
>>>> being ready from an epoll_wait call, the thread signals to the
>>>> condition variable for the socket.
>> 
>> Perhaps there is a bug in the way your epoll_wait thread
>> uses the condition variable to notify other threads?
>> 
> 
> This is possible; I've tried very hard (e.g. I added assertions to check 
> various error conditions) to ensure that there is problem in signaling the 
> other threads. From everything I can tell, it is working properly.
> 
>> 
>>>> The problem I am encountering is that sometimes a thread will block
>>>> waiting for the readiness signal and will never get notified, even
>>>> though there is data to be read. This behavior seems to go away when
>>>> I remove EPOLLONESHOT flag when registering the event. 
>> 
>> Is the thread the one waiting on the condition variable or epoll_wait?
>> In your situation (stream I/O via multiple threads, single epoll
>> descriptor), I think EPOLLONESHOT is the /only/ sane thing to do.
> 
> The one waiting on the condition variable.
> 
> I think I've narrowed down the problem a bit more. In my program I have 
> multiple epoll instances. Most of the epoll instances are for monitoring 
> sockets. One is used for monitoring an eventfd that is written to by other 
> threads. The problem only occurs when I write to the eventfd after servicing 
> each http request on a socket; i.e. the epoll monitoring the eventfd is 
> returning from a blocking epoll_wait call very frequently . If I don't do 
> that write, or if I use a different notification facility, for example poll, 
> to monitor the eventfd, then the problem goes away.  So it looks like there 
> may be some way in which different epoll instances can interfere with each 
> other. 
> 
> Probably this setup sounds weird to you, but I'm trying to spare you from 
> understanding my whole application;  this is part of a multicore runtime 
> system for a programming language with user-level threads and to explain the 
> full story of this would probably take more time than you want to spend.   
> But I can provide more detail if you like. 
> 
> -Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to