This part of the thread sounds really familiar. I recall someone coming up with a patch for this a few weeks ago, possibly committing it to -current. I'm too tired and it's too late, though; I'll look for it tomorrow if Matt doesn't find the thread in the archives first.
Mike "Silby" Silbersack On Sun, 2 Dec 2001, Matthew Dillon wrote: > > :curious, as the loopback's MTU is normally 16384. > :Also, any idea on where does the 4096 limit (1460*2+1176) come from ? > : > : cheers > : luigi > > It comes from the size of an mbuf, which is 2K. If you are trying to > send 4100 bytes of data what winds up happening is this: > > * construct 2048 byte mbuf and queue (TF_MORETOCOME set) > 1460 byte packet gets pushed out > * construct 2048 byte mbuf and queue (TF_MORETOCOME set) > 1460 byte packet gets pushed out > (1172 bytes left over in mbuf) > <<--- ack is received (semi synchronous) > 1172 bytes in transmit buffer are pushed out due to the ack > * construct 4 byte mbuf and queue (TF_MORETOCOME clear) > 4 bytes is pushed out due to TCP_NOWAIT being set. > > There are two localhost MTUs. If you use 'localhost' the MTU is 16384. > If you use the IP address of an ethernet interface on the machine the > MTU winds up being 1500 even though it is effectively a localhost > connection. An MTU of 1500 generates the 1460 byte push-outs. > > However, even with an MTU of 16384 you still have the same problem when > sending, say, 16384+2052 bytes of data. After it pushed out a 16384 byte > segment it winds up with 2048 bytes queued in the mbuf and a > received ack (again, semi synchronous because this is localhost) will > cause it to push out the 2048 bytes prematurely, before the last 4 bytes > can get queued. > > What we need is a mechanism in the tcp_input() code to NOT call > tcp_output() when an ACK is received, under certain circumstances. > I was thinking of taking the TF_MORETOCOME flag and causing it to be > left set for the duration of the write (except for the last sub-write). > At the moment it is set and cleared for each sub-write and the ack wiggles > its way in while it happens to be clear. In anycase, this would all > tcp_input() to skip calling tcp_output() prematurely. But it isn't so > easy to implement since the TF_ flags are in the 'tp' structure, not > the 'so' socket structure, and higher levels do not have direct access > to the tcp-specific 'tp' structure. > > -Matt > Matthew Dillon > <[EMAIL PROTECTED]> > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message