Robert Watson wrote:
On Fri, 2 Mar 2007, Andre Oppermann wrote:
Instead of the unlock-lock dance soreceive_stream() pulls a properly
sized (relative to the receive system call buffer space) from the
socket buffer drops the lock and gives copyout as much time as it
needs. In the mean time the lower half can happily add as many new
packets as it wants without having to wait for a lock. It also allows
the upper and lower halfs to run on different CPUs without much
interference. There is a unsolved nasty race condition in the patch
though. When the socket closes and we still have data around or the
copyout failed it tries to put the data back into the socket buffer
which is gone already by then leading to a panic. Work is underway to
find a realiable fix for this. I wanted to get this out to the
community nonetheless to give it some more exposure.
I'll try to take a look at this in the next few days.
However, I find the description above of soreceive() a bit odd -- I'm
pretty sure it doesn't do some of the things you're describing. For
example, soreceive() does release the locks acquired by the network
input processing path while copying to user space: there should be no
contention during the copyout(), only while processing the socket buffer
between copyout() calls. This is possible because the socket receive
sleep lock (not the mutex) holds sb_mb constant if it is non-NULL,
making copyout() of sb_mb->m_data safe while not holding the socket
buffer mutex in the current implementation.
The copyout is done without holding the lock. However for every mbuf
in the socket buffer it unlocks, does the copyout and then locks it again
for the next. I was referring to that unlock-lock pair for every mbuf.
In my experience, soreceive() is an incredibly complicated function, and
could stand significant simplification. However, it has to be done very
carefully for exactly this reason :-). There are some existing bugs in
soreceive(), one involving incorrect handling of interlaced I/O due to a
label being in the wrong place, that we should resolve.
It's damn complex. That's one of the reasons I started the soreceive_stream()
function and related stuff. To try to understand it and to document all the
evil edge cases right. I'm pretty sure I've not accounted for some yet.
BTW, the point of not pulling the data out of the socket buffer until
copyout() is complete is not error handling reversion so much as not
changing the advertised window size until the copy is done, since the
data isn't delivered to user space. Copyout() can take a very long time
to run, due to page faults, for example, and the socket buffer
represents a maximum bound on in-flight traffic as specified by the
application. Whether this is a property we want to keep is another
question, but I believe that's the rationale.
Haven't thought of that rationale yet. So far it appeared to me that it
was done for sanity reasons and there wasn't really a need in the spl()
days to do it otherwise. I'll think some more about it and whether it
is good, bad or doesn't matter. Mind you this patch is just a pretty
advanced proof of concept thing. Certainly not something to kick into
the tree by tomorrow or the day after.
--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"