Re: Is it possible to increase wscale multiplier?

Otto Moerbeek Wed, 04 Feb 2009 02:57:08 -0800

On Wed, Feb 04, 2009 at 11:34:25AM +0100, Claudio Jeker wrote:

> On Tue, Feb 03, 2009 at 04:28:36PM +0000, Dieter wrote:
> > > > > > How high is too high?  I have a utility that sets recv buf size
> > > > > > to 100,000,000 and it works fine on FreeBSD and NetBSD.  (Not
> > > > > > tested yet on OpenBSD.)  Necessary when the other end has buggy
> > > > > > network code and insufficient send buf.
> > > > > 
> > > > > Could you clarify what you mean by that?
> > > > 
> > > > Black box sends data to BSD box using TCP.  Data is generated in
> > > > real time, the rate cannot be changed.  Black box has a very small
> > > > (way too small) send buffer.  If the BSD box takes too long to
> > > > ack, the black box's send buffer fills up and data is lost,
> > > > and/or black box's buggy firmware screws up and data is lost.
> > > > So I have to do everything I can to ensure that incoming packets
> > > > do not get dropped, and that the acks get sent out as fast as
> > > > possible.  Making the TCP recv buffer very large allows the
> > > > incoming packets to get stored and acked, even if the userland
> > > > process reading the data doesn't get to run often enough.  Even
> > > > so, there is still the problem that other device drivers can and
> > > > do lock out the Ethernet driver for too long.  Still working on
> > > > that problem.  What we really need is true real time facilities.
> > > > 
> > > > It is a latency problem, not a throughput problem.  If the black
> > > > box were FLOSS instead of evil closed source it should be possible
> > > > to fix the buggy network code.
> > > > 
> > > 
> > > A) huge recv buffer does not solve your ACK problem.
> > > B) recv buffer is only affected by either the global
> > > net.inet.tcp.recvspace or the setsockopt SO_RCVBUF.
> > > C) the socketbuffers are limmited to 256kB
> > > D) Instead of playing with knobs that don't realy do what you think they
> > > will do you should make your userland app read faster.
> > 
> > It is a workaround.  The way to *solve* the problem is with a true
> > real time system.
> > 
> 
> No it is not. A real time OS does not do what you think it will do. Real
> Time OSes will ensure that a process is able to process an event in a
> defined time. It does not allow processes to go out for lunch come back
> after an hour and getting all the missed data.
> 
> > Grepping through a few log files, the userland program read 44,751,896 bytes
> > with a single syscall.  The default recv buf size of 65536 doesn't get the
> > job done for this application.
> > 
> 
> Then your application is badly designed. The socket layer and especially
> TCP will try to keep the usage of the recv buffer down by signaling the
> remote end to back off. It is not the duty of the socket layer to queue
> more then 40MB of data inside the kernel (and perhaps running the kernel
> out of memory because of that). We will not support preposterous socket
> buffer sizes. Fix your userland application to do smaller reads more often
> that's why there are so nice things as select or poll. Every CS student
> that visited an IPC in Unix course should be able to write this correctly.
> (/me is still optimistic about the amount of knowledge the avarage CS
> student has)
> 
> > It doesn't matter how fast the userland program is if it doesn't get
> > run often enough.  I have no way to guarantee how often a userland program
> > is run.  I have to estimate, add a safety factor, and size the buffers
> > accordingly.  As far as I can tell the only remaining problem is
> > when other device drivers lock out the Ethernet driver for too long.
> > Nothing I do to the userland program will change that.  I have to
> > figure out what driver(s) it is, and then figure out how to fix it.
> > At this point, problems are very rare.
> > 
> 
> Humbug. Your userland program is not well behaved and it has nothing todo
> with how fast the box is or if the Ethernet driver is locked out for too
> long.
> 
> Our socket buffers will never allow that amount of memory to be queued.


I think Claudio doesn't know that Step 1 in "solving" userland
throughput problems is to blame it on the kernel, hardware, drivers or
actually anything except the application? 

And I see the alternative "all my problems would be solved if OpenBSD
had feature X (in this case real-time support)" is also used, so extra
bonus points!

Anyway, I agree totally with Claudio here. I've had to slap co-workers
too many times for various offenses like always setting TCP_NODELAY
(or not setting it if it needs to be set), or other inventive ways to
"solve" TCP throughput related issues. 

        -Otto

Re: Is it possible to increase wscale multiplier?

Reply via email to