Re: New optimized soreceive_stream() for TCP sockets, proof of concept

Andre Oppermann Sat, 03 Mar 2007 15:22:32 -0800

Robert Watson wrote:

On Fri, 2 Mar 2007, Andre Oppermann wrote:
Instead of the unlock-lock dance soreceive_stream() pulls a properlysized (relative to the receive system call buffer space) from thesocket buffer drops the lock and gives copyout as much time as itneeds. In the mean time the lower half can happily add as many newpackets as it wants without having to wait for a lock. It also allowsthe upper and lower halfs to run on different CPUs without muchinterference. There is a unsolved nasty race condition in the patchthough. When the socket closes and we still have data around or thecopyout failed it tries to put the data back into the socket bufferwhich is gone already by then leading to a panic. Work is underway tofind a realiable fix for this. I wanted to get this out to thecommunity nonetheless to give it some more exposure.
I'll try to take a look at this in the next few days.
However, I find the description above of soreceive() a bit odd -- I'mpretty sure it doesn't do some of the things you're describing. Forexample, soreceive() does release the locks acquired by the networkinput processing path while copying to user space: there should be nocontention during the copyout(), only while processing the socket bufferbetween copyout() calls. This is possible because the socket receivesleep lock (not the mutex) holds sb_mb constant if it is non-NULL,making copyout() of sb_mb->m_data safe while not holding the socketbuffer mutex in the current implementation.


The copyout is done without holding the lock.  However for every mbuf
in the socket buffer it unlocks, does the copyout and then locks it again
for the next.  I was referring to that unlock-lock pair for every mbuf.

In my experience, soreceive() is an incredibly complicated function, andcould stand significant simplification. However, it has to be done verycarefully for exactly this reason :-). There are some existing bugs insoreceive(), one involving incorrect handling of interlaced I/O due to alabel being in the wrong place, that we should resolve.


It's damn complex.  That's one of the reasons I started the soreceive_stream()
function and related stuff.  To try to understand it and to document all the
evil edge cases right.  I'm pretty sure I've not accounted for some yet.

BTW, the point of not pulling the data out of the socket buffer untilcopyout() is complete is not error handling reversion so much as notchanging the advertised window size until the copy is done, since thedata isn't delivered to user space. Copyout() can take a very long timeto run, due to page faults, for example, and the socket bufferrepresents a maximum bound on in-flight traffic as specified by theapplication. Whether this is a property we want to keep is anotherquestion, but I believe that's the rationale.


Haven't thought of that rationale yet.  So far it appeared to me that it
was done for sanity reasons and there wasn't really a need in the spl()
days to do it otherwise.  I'll think some more about it and whether it
is good, bad or doesn't matter.  Mind you this patch is just a pretty
advanced proof of concept thing.  Certainly not something to kick into
the tree by tomorrow or the day after.

--
Andre

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: New optimized soreceive_stream() for TCP sockets, proof of concept

Reply via email to