Well that was fast! I made the change to my local copy (deleting one character seemed simpler than checking out from SVN) and tried it out. Works like a charm!
Thanks, Christian! Ken On Thu, Jun 26, 2014 at 3:07 PM, Christian Grothoff <[email protected]> wrote: > Found it, very simple off-by-one (character, not index): > > grothoff@spec:~/svn/libmicrohttpd/src/microhttpd$ svn diff > Index: daemon.c > =================================================================== > --- daemon.c (revision 33841) > +++ daemon.c (working copy) > @@ -3996,7 +3996,7 @@ > if ( (0 != (daemon->options & MHD_USE_THREAD_PER_CONNECTION)) && > (MHD_YES != MHD_mutex_lock_ (&daemon->cleanup_connection_mutex)) ) > MHD_PANIC ("Failed to acquire cleanup mutex\n"); > - for (pos = daemon->connections_head; NULL != pos; pos = pos->nextX) > + for (pos = daemon->connections_head; NULL != pos; pos = pos->next) > shutdown (pos->socket_fd, > (pos->read_closed == MHD_YES) ? SHUT_WR : SHUT_RDWR); > if ( (0 != (daemon->options & MHD_USE_THREAD_PER_CONNECTION)) && > > > 'nextX' is the linked list with connections waiting on timeout, > 'next' is the linked list with all connections; by going for 'nextX' > we fail to call shutdown on 'some' FDs, and thus the threads may > not terminate unless the client terminates them. In your case, the > browser keeps the connections open for 120s. > > Fixed in SVN 33874. > > Thanks for reporting & the testcase! > > -Christian > > On 06/26/2014 07:22 PM, Kenneth Mastro wrote: > > Hi Christian et all, > > > > It took me a few weeks to get back to this, but I managed to write up a > > test case today to demonstrate this 'delayed shutdown' problem. I sent > > Christian a PM with the test case rather then sending the entire list a > > tar.gz file (if it would even let me). > > > > The problem only seems to occur when using the 'thread per connection' > > threading option. It also has nothing to do with 'POST' processing or a > > 'request completed' callback - my test case doesn't have either of those > > things. Just simple 'GET's of images - not even any query string > parsing. > > Otherwise, I have nothing to offer except that it might be more likely to > > occur under heavier loads. Having the browser request 30 images > (specified > > in a single HTML file) seems to get it to happen every time on my dev > > machine. > > > > Oh - and I also updated to version 0.9.37, which made no difference (I > was > > using 0.9.35 when I saw the behavior). I'm running Ubuntu 12.04 LTS. > > > > Hope this helps. > > > > Christian - if you didn't get my email, please let me know. I sent it > just > > prior to sending this list email out. > > > > > > Thanks again, > > Ken > > > > > > > > > > On Sat, Jun 7, 2014 at 11:05 AM, Christian Grothoff <[email protected] > > > > wrote: > > > >> On 06/07/2014 04:30 PM, Kenneth Mastro wrote: > >>> Hi Christian, > >>> > >>> Thanks for the quick reply. > >>> > >>> And erk! Sorry about the OS thing. I'm on Linux. My development > >>> environment is a stock install of Ubuntu 12.04 LTS VM on a Windows 7 > PC. > >>> (I doubt the VM thing matters at all, but figured I'd mention it just > in > >>> case.) > >> > >> I doubt it as well. > >> > >>> Here's my MHD_start_daemon code, nothing earth shattering ( I assume > >> that's > >>> where the per-connection timeout would be set): > >>> ----------------------- > >>> myDaemon = MHD_start_daemon(MHD_USE_THREAD_PER_CONNECTION, // one > thread > >>> per connection > >>> WEB_TEST_PORT, // port > to > >>> listen on > >>> nullptr, // > accept > >>> policy callback > >>> nullptr, // extra > >> arg > >>> to accept policy callback > >>> &connectionCallbackC, // main > >>> 'connection' callback > >>> this, // extra > >>> argument to the 'connection' callback > >>> MHD_OPTION_NOTIFY_COMPLETED, // > >> specifies > >>> that the next arg is the callback to call when the connection is done > >>> &requestCompletedCallbackC, // the > >>> callback to call when the connection is done > >>> this, // extra > >> arg > >>> to the callback when the connection is done > >>> MHD_OPTION_CONNECTION_LIMIT, // > >> specifies > >>> that the next arg is the max number of simultaneous connections > >>> (unsigned int)20, // the > >>> number of permitted simultaneous connections > >>> MHD_OPTION_END); // no > more > >>> options in the arg list > >>> ---------------------- > >>> > >>> So - no timeout changes for MHD. I haven't changed the default timeout > >> for > >>> the kernel, so I'm guessing that 115 (awfully close to 120, i.e., 2 > >>> minutes) is something else? I agree completely that it's likely some > >> kind > >>> of timeout, though. > >> > >> You might want to check /proc/sys/net/ipv4/tcp_keepalive_time, just a > >> random guess, though. > >> > >>> I don't change any thread timeouts at all (although I have some other > >>> threads in my application that I set to real-time priority for some > audio > >>> processing, but I can't imagine how that would adversely affect MHD). > >>> > >>> If I didn't mention it before, I did trace this down to the 'join' of > the > >>> thread in 'close_all_connections'. I assume 'MHD_join_thread_' is > >> actually > >>> pthread_join since I'm on Linux. My gut was telling me that somehow > the > >>> thread isn't being told to stop as part of the shutdown process and is > >>> eventually just timing out somewhere inside MHD (or via TCP as you > >>> suggested). > >> > >> Right, but the interesting question is not where MHD_stop_daemon hangs, > >> but what that other thread is doing at the time > >> (gdb threads, thread #NUM, ba; alternatively, strace -f BINARY to see > >> what system call returns after 120s might also help a bit...). > >> > >> netstat -ntpl to see if some connection somehow is still open might also > >> be useful (alternatively, lsof -p). > >> > >>> Before I noticed this problem, I was running with > >>> 'MHD_USE_SELECT_INTERNALLY' with a thread-pool, but you had suggested > in > >> a > >>> previous post (a few weeks ago) that that won't work properly with > >>> 'comet-like' requests unless I do the suspend/resume functionality. > >> That's > >>> perfectly fine and good, but it seems worth mentioning that I didn't > >> notice > >>> this problem when using that thread mode. From looking at the > >>> 'close_all_connections' code in daemon.c, I can see why the behavior > >> could > >>> be different. > >>> > >>> > >>> Anyway - I'll play around with the HAVE_LISTEN_SHUTDOWN option, see if > >> that > >>> makes any difference. Failing that, I'll see if I can create a test > >> case. > >>> That make take a couple days, though - lots of code to strip out to > keep > >> it > >>> simple. What should I do once I get a concise test case ready to go? > >> Send > >>> the code to the mailing list? > >> > >> List is fine (if it's not megabytes ;-)), PM also. Whichever you're > >> comfortable with. > >> > >> Happy hacking! > >> > >> Christian > >> > >> > > > >
