On Wednesday, August 06, 2014 5:25:38 pm Jeremiah Lott wrote: > Hello, > > We've been seeing a problem where a tcp connection is stuck in a zero > window condition and even though the client has opened more window space, > our FreeBSD box never sends any more. After some analysis it appears that > the FreeBSD box is not sending zero window probes, because the persist > timer did not get set (we can see in kgdb that the tcpcb shows 0 window, > there is data in the socket buffer, but the persist timer is not active). > > After looking over the code for a while, I think I see the problem. When > tcp_output chooses to send a packet, it never arms the persist timer. This > causes a problem in the following scenario: > > 1. A --> B: packet containing enough data to fill the window > 2. B --> A: ACK for #1 + new data (0 window advertisement) > 3. A --> B: ACK for #2, 0 len packet > > In this case, A will not activate the persist timer, because it chose to > send a packet. Unless tcp_output is called for some other reason (delayed > ack timer, another input packet from B, socket syscall), A will not send > zero window probes. I was finally able to recreate this condition by > setting an very small window and running programs that send very specific > sequences of packets without calling recv (purposefully forcing a zero > window condition). Here is a packet capture that shows the sequence: > > A == 10.2.15.69 == FreeBSD 9.2 > B == 10.2.14.61 == FreeBSD 8.2 > > 16:19:49.664790 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [S], seq > 2362665163, win 4300, options [mss 1460,nop,wscale 6,sackOK,TS val 88804503 > ecr 0], length 0 > 16:19:49.664821 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [S.], seq > 3306387947, ack 2362665164, win 65535, options [mss 1460,nop,wscale > 6,sackOK,TS val 1605043666 ecr 88804503], length 0 > 16:19:49.664859 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], ack 1, > win 67, options [nop,nop,TS val 88804503 ecr 1605043666], length 0 > 16:19:49.664921 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq > 1:101, ack 1, win 67, options [nop,nop,TS val 88804503 ecr 1605043666], > length 100 > 16:19:49.665137 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [P.], seq > 1:3001, ack 101, win 2046, options [nop,nop,TS val 1605043666 ecr > 88804503], length 3000 > 16:19:49.665208 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq > 101:1321, ack 1449, win 45, options [nop,nop,TS val 88804503 ecr > 1605043666], length 1220 > 16:19:49.666195 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq > 1321:2769, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr > 1605043666], length 1448 > 16:19:49.666205 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack > 2769, win 2004, options [nop,nop,TS val 1605043667 ecr 88804503], length 0 > 16:19:49.666207 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq > 2769:2771, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr > 1605043666], length 2 > 16:19:49.667183 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq > 2771:4219, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr > 1605043667], length 1448 > 16:19:49.667190 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], seq > 3001:4345, ack 4219, win 1982, options [nop,nop,TS val 1605043668 ecr > 88804504], length 1344 > 16:19:49.667193 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq > 4219:4221, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr > 1605043667], length 2 > 16:19:49.766487 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq > 4221:4321, ack 4345, win 0, options [nop,nop,TS val 88804605 ecr > 1605043668], length 100 > 16:19:49.766499 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack > 4321, win 1980, options [nop,nop,TS val 1605043768 ecr 88804505], length 0 > > The important packets are the last four: > > 1. A --> B: length 1344, fills the remaining window > 2. B --> A: length 2, does not ack additional data, delayed ack timer is set > 3. B --> A: length 100, acks #1, immediate ack (delayed ack timer > cancelled, tcp_output called with ACKNOW) > 4. A --> B: length 0, acks #1 and #2, because a packet is sent tcp_output > does not activate the persist timer. > > I would normally expect A to begin sending zero-window probes, but (since > it didn't activate the persist timer) it does not. Using kgdb, I can see > that the persist timer is not set, only the keep timer is set. This is > kgdb on "A": > > (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_nxt > $5 = 3306392292 > (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_max > $6 = 3306392292 > (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_una > $7 = 3306392292 > (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_wnd > $8 = 0 > (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_cwnd > $9 = 4380 > (kgdb) print ((struct > tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_rexmt->c_flags > $11 = 16 > (kgdb) print ((struct > tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_persist->c_flags > $12 = 16 > (kgdb) print ((struct > tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_keep->c_flags > $13 = 22 > (kgdb) print ((struct > tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_2msl->c_flags > $14 = 16 > (kgdb) print ((struct > tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_delack->c_flags > $15 = 16 > (kgdb) print ((struct > tcpcb*)(0xfffffe02ae289b70))->t_inpcb->inp_socket.so_snd.sb_cc > $16 = 1656 > > There is zero window, data in the socket buffer, and the persist timer is > not set. > > My proposed fix follows. If you send a 0-length packet, but there is data > is the socket buffer, and neither the rexmt or persist timer is already > set, then activate the persist timer. > > --- sys/netinet/tcp_output.c (revision 269644) > +++ sys/netinet/tcp_output.c (working copy) > @@ -1290,7 +1290,12 @@ > tp->t_rxtshift = 0; > } > tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); > - } > + } else if (len == 0 && so->so_snd.sb_cc && > + !tcp_timer_active(tp, TT_REXMT) && > + !tcp_timer_active(tp, TT_PERSIST)) { > + tp->t_rxtshift = 0; > + tcp_setpersist(tp); > + } > > } else { > /* > * Persist case, update snd_max but since we are in > > Let me know any comments. Thanks,
I think your patch is correct, but please file this as a bug report so we can hopefully wrangle another person to review this. -- John Baldwin _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"