Hi Maxim, Thanks for the quick patch! I've applied it to our server and will monitor the results. Usually the problem starts to occur within 1-2 hours of a restart, so I'll post again later today with an update.
On Fri, Mar 24, 2017 at 2:03 PM, Maxim Dounin <mdou...@mdounin.ru> wrote: > Hello! > > On Fri, Mar 24, 2017 at 01:31:35PM +0100, Richard Stanway wrote: > >> Hello, >> I recently moved our site to a new server running Linux 4.9, Debian >> 8.7 64 bit with nginx 1.11.11 from the nginx repository. Our config is >> straightforward - epoll, a few proxy backends and a few fastcgi >> backends, a handful of vhosts, some with HTTP2, geoip module loaded. >> No AIO, no threads, no timer_resolution. >> >> After some time, nginx worker processes are getting stuck at 100% CPU >> use in what seems to be ngx_http_finalize_connection. New requests >> hitting the worker are completely stalled. Eventually all nginx >> workers will become stuck and the sites become unreachable. >> >> I'm running older versions of nginx on the same versions of Debian and >> Linux at other sites without a problem, but the server giving me >> problems also receives a much larger amount of traffic than the >> others. Due to the traffic, the debug log gets incredibly large which >> makes it difficult to isolate the error. I've posted a 1 second >> excerpt of the core debug log at http://pastebin.com/hqzGzjTV during >> the time that some of the workers were at 100%, however I'm not sure >> this contains enough information. I'll look into enabling HTTP level >> logging if necessary. >> >> Has anyone experienced anything similar to this or have any ideas >> where to start looking to debug this? >> >> Thanks. >> >> nginx version: nginx/1.11.11 >> built by gcc 4.9.2 (Debian 4.9.2-10) > > [...] > >> #0 0x000055d533ab87e8 in ngx_pfree (pool=0x55d536202fe0, >> p=0x55d5361636c0) at src/core/ngx_palloc.c:282 >> #1 0x000055d533af54d9 in ngx_http_set_keepalive (r=<optimized out>) >> at src/http/ngx_http_request.c:3000 >> #2 ngx_http_finalize_connection (r=<optimized out>) at >> src/http/ngx_http_request.c:2556 >> #3 0x000055d533af0d8b in ngx_http_core_content_phase >> (r=0x55d536136f10, ph=0x55d537cbf210) at >> src/http/ngx_http_core_module.c:1391 > > I think I see the problem. > Please try the following patch: > > diff --git a/src/http/ngx_http_request.c b/src/http/ngx_http_request.c > --- a/src/http/ngx_http_request.c > +++ b/src/http/ngx_http_request.c > @@ -2904,6 +2904,7 @@ ngx_http_set_keepalive(ngx_http_request_ > } > > cl->buf = b; > + cl->next = NULL; > > hc->busy = cl; > hc->nbusy = 1; > > -- > Maxim Dounin > http://nginx.org/ > _______________________________________________ > nginx mailing list > nginx@nginx.org > http://mailman.nginx.org/mailman/listinfo/nginx _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx