Hi, We observed the same issue on our FreeBSD6 and 7 servers. I tried to reproduce the problem writing a simple test case but failed -- I didn't come to the idea of shutdown()/close() sequence (as Anton did). Although looking now at the code we had the issue with I see that shutdown()/close() sequence was used there too.
It looks like SO_LINGER is not important to reproduce ECONNRESET. shutdown()/close() on one end and close() on the other is enough. Also, slowdown of one the processes (done by Anton using select()) is not important too. Taking this into consideration I have wrote a simplified version of a test to reproduce the bug (may be it worth of including to tools/regression/sockets?). I can easily reproduce the error with this test on FreeBSD7.1 and 8-STABLE. Adding some prints to the kernel code I localized the place where the error appears and added panic() to get a backtrace. So, the backtrace: (kgdb) bt #0 doadump () at pcpu.h:246 #1 0xc04ec829 in db_fncall (dummy1=-1064461270, dummy2=0, dummy3=-1, dummy4=0xe85e58b0 "дX^Х") at /usr/src/sys/ddb/db_command.c:548 #2 0xc04ecc5f in db_command (last_cmdp=0xc0e0af9c, cmd_table=0x0, dopager=0) at /usr/src/sys/ddb/db_command.c:445 #3 0xc04ecd14 in db_command_script (command=0xc0e0bec4 "call doadump") at /usr/src/sys/ddb/db_command.c:516 #4 0xc04f0e50 in db_script_exec (scriptname=0xe85e59bc "kdb.enter.panic", warnifnotfound=Variable "warnifnotfound" is not available. ) at /usr/src/sys/ddb/db_script.c:302 #5 0xc04f0f37 in db_script_kdbenter (eventname=0xc0cc78ea "panic") at /usr/src/sys/ddb/db_script.c:324 #6 0xc04eec18 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228 #7 0xc08d9aa6 in kdb_trap (type=3, code=0, tf=0xe85e5af8) at /usr/src/sys/kern/subr_kdb.c:535 #8 0xc0befecb in trap (frame=0xe85e5af8) at /usr/src/sys/i386/i386/trap.c:690 #9 0xc0bd15eb in calltrap () at /usr/src/sys/i386/i386/exception.s:165 #10 0xc08d9c2a in kdb_enter (why=0xc0cc78ea "panic", msg=0xc0cc78ea "panic") at cpufunc.h:71 #11 0xc08a95b6 in panic (fmt=0xc0ce6585 "ECONNRESET") at /usr/src/sys/kern/kern_shutdown.c:562 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at /usr/src/sys/netinet/tcp_usrreq.c:552 #13 0xc09111bd in sodisconnect (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:810 #14 0xc0914144 in soclose (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:658 #15 0xc08f6459 in soo_close (fp=0xc743e230, td=0xc7023000) at /usr/src/sys/kern/sys_socket.c:291 #16 0xc086efc3 in _fdrop (fp=0xc743e230, td=0xc7023000) at file.h:293 #17 0xc0870cf0 in closef (fp=0xc743e230, td=0xc7023000) at /usr/src/sys/kern/kern_descrip.c:2117 #18 0xc0871097 in kern_close (td=0xc7023000, fd=4) at /usr/src/sys/kern/kern_descrip.c:1162 #19 0xc087123a in close (td=0xc7023000, uap=0xe85e5cf8) at /usr/src/sys/kern/kern_descrip.c:1114 #20 0xc0bef600 in syscall (frame=0xe85e5d38) at /usr/src/sys/i386/i386/trap.c:1111 #21 0xc0bd1680 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 12 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at /usr/src/sys/netinet/tcp_usrreq.c:552 552 panic("ECONNRESET"); (kgdb) list 547 inp = sotoinpcb(so); 548 KASSERT(inp != NULL, ("tcp_usr_disconnect: inp == NULL")); 549 INP_WLOCK(inp); 550 if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { 551 error = ECONNRESET; 552 panic("ECONNRESET"); 553 /* log(LOG_INFO, "ECONNRESET 3: file %s; line %d\n", __FILE__, __LINE__); */ 554 goto out; 555 } 556 tp = intotcpcb(inp); (kgdb) p/x inp->inp_flags $1 = 0x4800000 #define INP_DROPPED 0x04000000 /* protocol drop flag */ (kgdb) fr 14 #14 0xc0914144 in soclose (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:658 658 error = sodisconnect(so); (kgdb) list 653 654 CURVNET_SET(so->so_vnet); 655 funsetown(&so->so_sigio); 656 if (so->so_state & SS_ISCONNECTED) { 657 if ((so->so_state & SS_ISDISCONNECTING) == 0) { 658 error = sodisconnect(so); 659 if (error) { 660 if (error == ENOTCONN) 661 error = 0; 662 goto drop; (kgdb) p/x so->so_state $2 = 0x2000 #define SS_ISDISCONNECTED 0x2000 /* socket disconnected from peer */ (the code differs a bit here from original 8-STABLE version as it includes patch from kern/144061 to deal with ENOTCONN errors but this does not influence). Actually, I think that the issue here is similar to kern/144061. In the code above so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING without locking and then sodisconnect() is called. It looks like the following may happen: 1) after shutdown() our output is closed; 2) then we call close(), soclose() checks that we are still in SS_ISCONNECTED and calls sodisconnect(); 3) at this time FIN arrives from the other end, which has called close() too, and the kernel disconnects the socket (INP_DROPPED is set); 4) sodisconnect()/tcp_usr_disconnect() checks for INP_DROPPED and returns ECONNRESET. I am attaching the patch, which may not be a solution but rather for illustration to described above. Running the test with this patch I am observing the following messages in error logs May 27 23:55:41 zhuzha kernel: ECONNRESET: so->state: 0x2000; file /usr/src/sys/kern/uipc_socket.c; line 664 and test does not fail. -- Mikolaj Golub
tcp_close.c
Description: Binary data
uipc_socket.c.econnreset.patch
Description: Binary data
_______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"