https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212920
Bug ID: 212920 Summary: Li loaded web server cath race condition on _close () from /lib/libc.so.7 with accf_http Product: Base System Version: 10.3-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-b...@freebsd.org Reporter: fbsd98816...@avksrv.org CC: freebsd-amd64@FreeBSD.org CC: freebsd-amd64@FreeBSD.org Hello! Recently we upgraded our high loaded web server to FREEBSD-STABLE 10.3 r305091 and got problem with NGINX (nginx-1.10.1_2,2 compiled from latest ports with most default settings). After some time one worker stopped answer requests and top command shows it in state soclos 1072 nobody 1 22 0 1698M 65680K soclos 5 0:13 0.00% nginx after short while next worker stops in same state and so on untill all workers become "soclos" and web server stops serve requests (but still accept connections, which die on timeout after client sent a request). Increasing workers count only move problem to next half an hour. Restarting nginx fix for some not so long time. Server is more or less high loaded with 1000-2000 request/sec. Actually server is frontend proxy with proxy_cache functionality. We tried on 2 different phisical servers with actually different NICs and CPUs. When we returned kernel (only kernel and modules at /boot/kernel, not world) to r302223, problem gone. We tried to upgrade to yesterdey's r306194. Problem is still here. Something changed between end of Jun and end of Aug in kernel code what generate a problem backtrace from nginx while it in "soclos" #0 0x0000000801a17d28 in _close () from /lib/libc.so.7 #1 0x000000080098a925 in pthread_suspend_all_np () from /lib/libthr.so.3 #2 0x00000000004329b9 in ngx_close_connection (c=0x869c1de70) at src/core/ngx_connection.c:1169 #3 0x0000000000486370 in ngx_http_close_connection (c=0x869c1de70) at src/http/ngx_http_request.c:3543 #4 0x0000000000488e86 in ngx_http_close_request (r=0x80244c050, rc=408) at src/http/ngx_http_request.c:3406 #5 0x000000000048d9ed in ngx_http_process_request_headers (rev=0x807810b70) at src/http/ngx_http_request.c:1202 #6 0x000000000044fdbd in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94 #7 0x000000000044e60f in ngx_process_events_and_timers (cycle=0x802488050) at src/event/ngx_event.c:256 #8 0x000000000045f406 in ngx_worker_process_cycle (cycle=0x802488050, data=0xa) at src/os/unix/ngx_process_cycle.c:753 #9 0x000000000045ae7c in ngx_spawn_process (cycle=0x802488050, proc=0x45f2f0 <ngx_worker_process_cycle>, data=0xa, name=0x53ecea "worker process", respawn=-3) at src/os/unix/ngx_process.c:198 #10 0x000000000045cc89 in ngx_start_worker_processes (cycle=0x802488050, n=16, type=-3) at src/os/unix/ngx_process_cycle.c:358 #11 0x000000000045c486 in ngx_master_process_cycle (cycle=0x802488050) at src/os/unix/ngx_process_cycle.c:130 #12 0x0000000000413288 in main (argc=1, argv=0x7fffffffead0) at src/core/nginx.c:367 (gdb) list src/core/ngx_connection.c:1169 1164 1165 if (c->shared) { 1166 return; 1167 } 1168 1169 if (ngx_close_socket(fd) == -1) { <<<<<<<< 1170 1171 err = ngx_socket_errno; 1172 1173 if (err == NGX_ECONNRESET || err == NGX_ENOTCONN) { and actually called close(fd): #define ngx_close_socket close All TCP sessions opened by worker frose in present state. Same if we do not load and do not use in nginx config accf_http, problem not repeased with all 3 tested kernels kernel GENERIC and only extra accf_http ipmi smbus mfip ums zfs and opensolaris module loaded As long as accf_http did some good for our server, we can not simple disabe the module in production env. I'll debug more, but as long as I'm not is good C programmer, it will take some time. If someone knows what changed in related functions, may be it will be faster to check from that side.. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ freebsd-amd64@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-amd64 To unsubscribe, send any mail to "freebsd-amd64-unsubscr...@freebsd.org"