On Wed, Feb 04, 2009 at 11:34:25AM +0100, Claudio Jeker wrote: > On Tue, Feb 03, 2009 at 04:28:36PM +0000, Dieter wrote: > > > > > > How high is too high? I have a utility that sets recv buf size > > > > > > to 100,000,000 and it works fine on FreeBSD and NetBSD. (Not > > > > > > tested yet on OpenBSD.) Necessary when the other end has buggy > > > > > > network code and insufficient send buf. > > > > > > > > > > Could you clarify what you mean by that? > > > > > > > > Black box sends data to BSD box using TCP. Data is generated in > > > > real time, the rate cannot be changed. Black box has a very small > > > > (way too small) send buffer. If the BSD box takes too long to > > > > ack, the black box's send buffer fills up and data is lost, > > > > and/or black box's buggy firmware screws up and data is lost. > > > > So I have to do everything I can to ensure that incoming packets > > > > do not get dropped, and that the acks get sent out as fast as > > > > possible. Making the TCP recv buffer very large allows the > > > > incoming packets to get stored and acked, even if the userland > > > > process reading the data doesn't get to run often enough. Even > > > > so, there is still the problem that other device drivers can and > > > > do lock out the Ethernet driver for too long. Still working on > > > > that problem. What we really need is true real time facilities. > > > > > > > > It is a latency problem, not a throughput problem. If the black > > > > box were FLOSS instead of evil closed source it should be possible > > > > to fix the buggy network code. > > > > > > > > > > A) huge recv buffer does not solve your ACK problem. > > > B) recv buffer is only affected by either the global > > > net.inet.tcp.recvspace or the setsockopt SO_RCVBUF. > > > C) the socketbuffers are limmited to 256kB > > > D) Instead of playing with knobs that don't realy do what you think they > > > will do you should make your userland app read faster. > > > > It is a workaround. The way to *solve* the problem is with a true > > real time system. > > > > No it is not. A real time OS does not do what you think it will do. Real > Time OSes will ensure that a process is able to process an event in a > defined time. It does not allow processes to go out for lunch come back > after an hour and getting all the missed data. > > > Grepping through a few log files, the userland program read 44,751,896 bytes > > with a single syscall. The default recv buf size of 65536 doesn't get the > > job done for this application. > > > > Then your application is badly designed. The socket layer and especially > TCP will try to keep the usage of the recv buffer down by signaling the > remote end to back off. It is not the duty of the socket layer to queue > more then 40MB of data inside the kernel (and perhaps running the kernel > out of memory because of that). We will not support preposterous socket > buffer sizes. Fix your userland application to do smaller reads more often > that's why there are so nice things as select or poll. Every CS student > that visited an IPC in Unix course should be able to write this correctly. > (/me is still optimistic about the amount of knowledge the avarage CS > student has) > > > It doesn't matter how fast the userland program is if it doesn't get > > run often enough. I have no way to guarantee how often a userland program > > is run. I have to estimate, add a safety factor, and size the buffers > > accordingly. As far as I can tell the only remaining problem is > > when other device drivers lock out the Ethernet driver for too long. > > Nothing I do to the userland program will change that. I have to > > figure out what driver(s) it is, and then figure out how to fix it. > > At this point, problems are very rare. > > > > Humbug. Your userland program is not well behaved and it has nothing todo > with how fast the box is or if the Ethernet driver is locked out for too > long. > > Our socket buffers will never allow that amount of memory to be queued.
I think Claudio doesn't know that Step 1 in "solving" userland throughput problems is to blame it on the kernel, hardware, drivers or actually anything except the application? And I see the alternative "all my problems would be solved if OpenBSD had feature X (in this case real-time support)" is also used, so extra bonus points! Anyway, I agree totally with Claudio here. I've had to slap co-workers too many times for various offenses like always setting TCP_NODELAY (or not setting it if it needs to be set), or other inventive ways to "solve" TCP throughput related issues. -Otto