Hi all I have been seeing slow but steady socket leak in nginx ever since I upgraded from 1.0.5 to 1.2.6. I have my custom module in nginx which I was sure what was the leak. This is how I went about investigating: 1. Configure nginx with one worker 2. strace on the worker process, tracing read/readv/write/writev/close/shutdown calls 3. Every now and then, for all the open fds (from ls -l /proc/<pid>/fd), check the socket that is not available in netstat -pane 4. What I saw was, the leaking socket always had the last operation as writev which returned an error. 5. Increased the nginx log level to info and verified that nginx was getting ECONNRESET or EPIPE on writev failure. Which was OK. 6. Traced back in code to see how it is handled, the error translates to CHAIN_ERROR and eventually causes ngx_http_finalize_request to be called. This in turn calls ngx_http_terminate_request.
However, in this function, the request is not terminated if r->write_event_handler is set. This seems to be set if the request handler is a user module. I think the rationale for the check is, if there is a module who is handling the request, dont terminate yet, wait for a write event on the socket and then terminate it (which is why I thought it is setting r->write_event_handler to ngx_http_terminate_handler). I tried to repro this w/ empty_gif_handler however, it sends header and body in one call to writev which I cant get to fail in my test environment. To reproduce the bug, if I replace the call to ngx_http_send_response with ngx_http_send_header and ngx_http_output_filter (as used by ngx_upstream or other modules which dont have the headers and body together), I could reproduce the leak. I have a client that sends a request and closes the socket immediately, nginx sees the error, prints the info log, and then it doesnt close the socket. I have a small patch attached, the fix I did is basically saying that if there is a connection error, there is no point setting write_event_handler as there wont be any activity on the socket, so just terminate it immediately. I could be very wrong in the understanding of the code flow. My patch just fixes this and I am not very sure if this is the right fix. Please let me know. I will try to add a testcase to reproduce this in the nginx test framework. Thank you for your patience. Regards +Fasih
diff --git a/nginx-1.2.6/src/http/ngx_http_request.c b/nginx-1.2.6/src/http/ngx_http_request.c --- a/nginx-1.2.6/src/http/ngx_http_request.c +++ b/nginx-1.2.6/src/http/ngx_http_request.c @@ -2172,7 +2172,7 @@ ngx_http_terminate_request(ngx_http_request_t *r, ngx_int_t rc) "http terminate cleanup count:%d blk:%d", mr->count, mr->blocked); - if (mr->write_event_handler) { + if (mr->write_event_handler && !c->error) { if (mr->blocked) { return;
_______________________________________________ nginx-devel mailing list nginx-devel@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx-devel