I realized, late last night, that I was wrong on a few
details concerning this bug:

1.) The retransmit timer does not keep popping on without
being restarted.

2.) ip_output() must return ENOBUFS (TCP_MAXRXTSHIFT + 1) times
to the same, non-transmitting TCP.

3.) Given a TCP as described below, when tcp_output() uses ENOBUFS
to blindly start the retransmit timer then tp->t_rxtshift will be
falsely incremented and never cleared.

Thus the bug manifests itself because it appears for a TCP that
never transmits nobody ever clears clears tp->t_rxtshift;
this allows tp->t_rxtshift to slowly count up to TCP_MAXRXTSHIFT;
once TCP_MAXRXTSHIFT is exceeded tcp_timer_rexmt() will
kill the poor innocent TCP.

On 02/01/07 17:23, Dave Baukus wrote:
There is a bug  tcp_output() for at least freeBSD6.1
that causes a perfectly good TCP to be dropped by its
retransmit timer; the application receives ETIMEDOUT.

Consider a TCP that never transmits (the receive end of the ttcp
utility is an example), while the TCP is established
snd_max == snd_una == snd_nxt == (isr + 1) and the retransmit
timer should never be started. If the retransmit timer is started
then it is never stopped by tcp_input/tcp_out because
snd_max == snd_una == snd_nxt (always). Once started the
timer continues its count up till tp->t_rxtshift == 12 and
the connection that never transmitted gets falsely killed.

The bug is to blindly rely on the return value of ip_output().
If ip_output() returns ENOBUFS then the retransmit timer is
activated:

 From the end of tcp_output():
out:
SOCKBUF_UNLOCK_ASSERT(&so->so_snd);    /* Check gotos. */
if (error == ENOBUFS) {
        if (!callout_active(tp->tt_rexmt) &&
            !callout_active(tp->tt_persist))
                     callout_reset(tp->tt_rexmt, tp->t_rxtcur,
                         tcp_timer_rexmt, tp);
                     tp->snd_cwnd = tp->t_maxseg;
                     return (0);
}

My simple minded fix would be not to start the retransmit timer;
if tcp_output() wanted to time this transmit it would have started
the timer up above.

This ETIMEDOUT problem is easily recreated on any old machine
using a single slow ethernet device and the ttcp test utility.
First, fire up a couple ttcp receivers. Second, flood the same
interface with enough ttcp transmitters to cause the driver's transmit
ring and interface queue to back up. Eventually, one of the ttcp
receives will get ENOBUFS from ip_output() and the retransmit
timer will be wrongly activated for a pure ACK segment.

I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r


--
Dave Baukus
   [EMAIL PROTECTED]
   972-479-2491

   Fujitsu Network Communications
         Richardson, Texas
                 USA
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to