I have a server using evhttp, which was running quite stably on 2.0.8.
It uses a pthreads-based worker pool, but accepts http connections in
the network thread, using one event base.
On 2.0.10 and 2.0.11 things are now crashing or locking very quickly.
The code to startup the server in the network thread is pretty routine:
evthread_use_pthreads();
base = event_base_new();
http = evhttp_new(base);
evthread_make_base_notifiable(base);
evhttp_set_gencb(http, LENetwork::NetworkHandler, this);
evhttp_bind_socket(http, "0.0.0.0", portnum);
event_base_dispatch(base);
The callback immediately hands off the *evhttp_request *pointer to a
worker, which then winds up calling *evhttp_send_reply* (in that worker
thread, not the network thread).
Running under libevent-2.0.11, my Linux machine just locks up and stops
accepting connections, and I can't find much going on in gdb.
Helgrind complains about things like this:
==29140== Possible data race during write of size 4 at 0x40b51d4 by
thread #15
==29140== at 0x810175F: bufferevent_setcb (bufferevent.c:351)"
On OpenSolaris, the problem takes more time to show, I can force
consistent segfaults. The crashing stack appears to be:
#0 bufferevent_setcb (bufev=0x6b696d2f, readcb=0, writecb=0x8148110
<evhttp_write_cb>, eventcb=0x814cf60 <evhttp_error_cb>, cbarg=0x82a06a8)
at bufferevent.c:345
345 BEV_LOCK(bufev);
Current language: auto; currently c
(gdb) bt
#0 bufferevent_setcb (bufev=0x6b696d2f, readcb=0, writecb=0x8148110
<evhttp_write_cb>, eventcb=0x814cf60 <evhttp_error_cb>, cbarg=0x82a06a8)
at bufferevent.c:345
#1 0x0814adff in evhttp_write_buffer (evcon=0x82a06a8, cb=<value
optimized out>, arg=<value optimized out>) at http.c:377
So both platforms are reporting some problem with *bufferevent_setcb*.
*Questions: *is bufferevent supposed threadsafe for this case? I suppose
I haven't verified if my technique of handing evhttp_request over the
thread boundary is supposed to work or not...it has just been working
for quite a long time under earlier versions of libevent.
Why did it work so reliably in 2.0.8 and 2.0.7 and not in 2.0.10/11?
Any help fixing or tracking down further?
thanks,
mike