Well that was fast!

I made the change to my local copy (deleting one character seemed simpler
than checking out from SVN) and tried it out.  Works like a charm!

Thanks, Christian!


Ken



On Thu, Jun 26, 2014 at 3:07 PM, Christian Grothoff <[email protected]>
wrote:

> Found it, very simple off-by-one (character, not index):
>
> grothoff@spec:~/svn/libmicrohttpd/src/microhttpd$ svn diff
> Index: daemon.c
> ===================================================================
> --- daemon.c    (revision 33841)
> +++ daemon.c    (working copy)
> @@ -3996,7 +3996,7 @@
>    if ( (0 != (daemon->options & MHD_USE_THREAD_PER_CONNECTION)) &&
>         (MHD_YES != MHD_mutex_lock_ (&daemon->cleanup_connection_mutex)) )
>      MHD_PANIC ("Failed to acquire cleanup mutex\n");
> -  for (pos = daemon->connections_head; NULL != pos; pos = pos->nextX)
> +  for (pos = daemon->connections_head; NULL != pos; pos = pos->next)
>      shutdown (pos->socket_fd,
>               (pos->read_closed == MHD_YES) ? SHUT_WR : SHUT_RDWR);
>    if ( (0 != (daemon->options & MHD_USE_THREAD_PER_CONNECTION)) &&
>
>
> 'nextX' is the linked list with connections waiting on timeout,
> 'next' is the linked list with all connections; by going for 'nextX'
> we fail to call shutdown on 'some' FDs, and thus the threads may
> not terminate unless the client terminates them.  In your case, the
> browser keeps the connections open for 120s.
>
> Fixed in SVN 33874.
>
> Thanks for reporting & the testcase!
>
> -Christian
>
> On 06/26/2014 07:22 PM, Kenneth Mastro wrote:
> > Hi Christian et all,
> >
> > It took me a few weeks to get back to this, but I managed to write up a
> > test case today to demonstrate this 'delayed shutdown' problem.  I sent
> > Christian a PM with the test case rather then sending the entire list a
> > tar.gz file (if it would even let me).
> >
> > The problem only seems to occur when using the 'thread per connection'
> > threading option.  It also has nothing to do with 'POST' processing or a
> > 'request completed' callback - my test case doesn't have either of those
> > things.  Just simple 'GET's of images - not even any query string
> parsing.
> > Otherwise, I have nothing to offer except that it might be more likely to
> > occur under heavier loads.  Having the browser request 30 images
> (specified
> > in a single HTML file) seems to get it to happen every time on my dev
> > machine.
> >
> > Oh - and I also updated to version 0.9.37, which made no difference (I
> was
> > using 0.9.35 when I saw the behavior).  I'm running Ubuntu 12.04 LTS.
> >
> > Hope this helps.
> >
> > Christian - if you didn't get my email, please let me know.  I sent it
> just
> > prior to sending this list email out.
> >
> >
> > Thanks again,
> > Ken
> >
> >
> >
> >
> > On Sat, Jun 7, 2014 at 11:05 AM, Christian Grothoff <[email protected]
> >
> > wrote:
> >
> >> On 06/07/2014 04:30 PM, Kenneth Mastro wrote:
> >>> Hi Christian,
> >>>
> >>> Thanks for the quick reply.
> >>>
> >>> And erk!  Sorry about the OS thing.  I'm on Linux.  My development
> >>> environment is a stock install of Ubuntu 12.04 LTS VM on a Windows 7
> PC.
> >>> (I doubt the VM thing matters at all, but figured I'd mention it just
> in
> >>> case.)
> >>
> >> I doubt it as well.
> >>
> >>> Here's my MHD_start_daemon code, nothing earth shattering ( I assume
> >> that's
> >>> where the per-connection timeout would be set):
> >>> -----------------------
> >>> myDaemon = MHD_start_daemon(MHD_USE_THREAD_PER_CONNECTION, // one
> thread
> >>> per connection
> >>>                                 WEB_TEST_PORT,                 // port
> to
> >>> listen on
> >>>                                 nullptr,                       //
> accept
> >>> policy callback
> >>>                                 nullptr,                       // extra
> >> arg
> >>> to accept policy callback
> >>>                                 &connectionCallbackC,          // main
> >>> 'connection' callback
> >>>                                 this,                          // extra
> >>> argument to the 'connection' callback
> >>>                                 MHD_OPTION_NOTIFY_COMPLETED,   //
> >> specifies
> >>> that the next arg is the callback to call when the connection is done
> >>>                                 &requestCompletedCallbackC,    // the
> >>> callback to call when the connection is done
> >>>                                 this,                          // extra
> >> arg
> >>> to the callback when the connection is done
> >>>                                 MHD_OPTION_CONNECTION_LIMIT,   //
> >> specifies
> >>> that the next arg is the max number of simultaneous connections
> >>>                                 (unsigned int)20,              // the
> >>> number of permitted simultaneous connections
> >>>                                 MHD_OPTION_END);               // no
> more
> >>> options in the arg list
> >>> ----------------------
> >>>
> >>> So - no timeout changes for MHD.  I haven't changed the default timeout
> >> for
> >>> the kernel, so I'm guessing that 115 (awfully close to 120, i.e., 2
> >>> minutes) is something else?  I agree completely that it's likely some
> >> kind
> >>> of timeout, though.
> >>
> >> You might want to check /proc/sys/net/ipv4/tcp_keepalive_time, just a
> >> random guess, though.
> >>
> >>> I don't change any thread timeouts at all (although I have some other
> >>> threads in my application that I set to real-time priority for some
> audio
> >>> processing, but I can't imagine how that would adversely affect MHD).
> >>>
> >>> If I didn't mention it before, I did trace this down to the 'join' of
> the
> >>> thread in 'close_all_connections'.  I assume 'MHD_join_thread_' is
> >> actually
> >>> pthread_join since I'm on Linux.  My gut was telling me that somehow
> the
> >>> thread isn't being told to stop as part of the shutdown process and is
> >>> eventually just timing out somewhere inside MHD (or via TCP as you
> >>> suggested).
> >>
> >> Right, but the interesting question is not where MHD_stop_daemon hangs,
> >> but what that other thread is doing at the time
> >> (gdb threads, thread #NUM, ba; alternatively, strace -f BINARY to see
> >> what system call returns after 120s might also help a bit...).
> >>
> >> netstat -ntpl to see if some connection somehow is still open might also
> >> be useful (alternatively, lsof -p).
> >>
> >>> Before I noticed this problem, I was running with
> >>> 'MHD_USE_SELECT_INTERNALLY' with a thread-pool, but you had suggested
> in
> >> a
> >>> previous post (a few weeks ago) that that won't work properly with
> >>> 'comet-like' requests unless I do the suspend/resume functionality.
> >>  That's
> >>> perfectly fine and good, but it seems worth mentioning that I didn't
> >> notice
> >>> this problem when using that thread mode.  From looking at the
> >>> 'close_all_connections' code in daemon.c, I can see why the behavior
> >> could
> >>> be different.
> >>>
> >>>
> >>> Anyway - I'll play around with the HAVE_LISTEN_SHUTDOWN option, see if
> >> that
> >>> makes any difference.  Failing that, I'll see if I can create a test
> >> case.
> >>> That make take a couple days, though - lots of code to strip out to
> keep
> >> it
> >>> simple.  What should I do once I get a concise test case ready to go?
> >>  Send
> >>> the code to the mailing list?
> >>
> >> List is fine (if it's not megabytes ;-)), PM also. Whichever you're
> >> comfortable with.
> >>
> >> Happy hacking!
> >>
> >> Christian
> >>
> >>
> >
>
>

Reply via email to