On Thu, 21 Sep 2006, Andre Oppermann wrote:
There should be unconditional M_NOWAIT. Oops, the M_DONTWAIT in the current
code is incorrect. It is present since rev. 1.171. If the m_uiotombuf()
fails the current code returns from syscall without error! Before rev.
1.171, there wasn't m_uiotombuf(), the mbuf header was allocated below,
with correct wait argument.
The wait argument for m_uiotombuf() should be changed to M_WAITOK, but in a
separate commit.
<snip>
This one should be M_WAITOK always. It is M_TRYWAIT (equal to M_WAITOK) in
the current code.
The reason why I changed the mbuf allocations with SS_NBIO is the rationale
of sendfile() and the performance evaluation that was done by alc@ students.
sendfile() has two flags which control its blocking behavior. Non blocking
socket (SS_NBIO) and SF_NODISKIO. The latter is necessary because file
reads or writes are normally not considered to be blocking. The most
optimal sendfile() is usage is with a single process doing accept(), parsing
and then sendfile that should never ever block on anything. This way the
main process then can use kqueue for all the socket stuff and it can
transfer all sends that require disk I/O to a child process or thread to
provide a context for the read. Meanwhile the main process is free to
accept further connections and to continue serving existing connections.
Having sendfile() block in mbuf allocation for the header, on sfbufs or
anything else is not desirable and must be avoided. I know I'm extending
the traditional definition of SS_NBIO a bit but it's fully in line with the
semantics and desired operational behavior of sendfile(). The paper by
alc@'s students clearly identifies this as the main property of a sendfile
implementation besides its zero copy nature.
The semantics with regard to waiting are a bit confusing, but the existing
model has a fairly specific meaning that has some benefits. Normally we have
three dispositions for a network I/O operation:
(1) Fully blocking -- the default disposition. The operation may block for
several reasons, but most usually due to either insufficient buffer
space/data in the socket buffer, insufficient memory for the kernel to
perform the operation (usually mbufs), or due to a user space page fault
in reading or writing the data.
(2) Non-blocking -- SS_NBIO, MSG_NBIO, etc. The operation will not block if
there is insufficient data/buffer space. Typically, this is aligned with
select()/poll()/kqueue()'s notion of data or space.
(3) Non-waiting -- MSG_DONTWAIT. The operation will not sleep in kernel for
any reason, either as part of I/O blocking, or for memory allocation. It
may still sleep if a page fault occurs, but as kernel senders send using
pinned kernel memory, this isn't an issue.
There are a few known bugs -- for example, in zero-copy mode, we may block
waiting for an sf_buf with MSG_DONTWAIT set (this used to be the case, haven't
checked lately). However, for applications, you typically run in (1) or (2)
of the above, where the notion of blocking is aligned with a notion of buffer
space or data, not with a notion of kernel sleeping. In particular, it has to
do with the definition used by select()/kqueue()/poll(). If you make SS_NBIO
sockets return immediately if there is no memory free for sendfile(), this
will be inconsistent with the normal behavior in which select() returning
writable means that you will be able to write -- so an application that shows
the socket as writable via select() might sit there spinning performing the
I/O operation, with it repeatedly returning an error saying it wasn't ready.
My feeling is that we should constrain absolutely non-sleeping to the
MSG_DONTWAIT case -- if desired, we could add SF_DONTWAIT to determine if
sleeping ever at all happens. SS_NBIO should not return an error in a limited
memory case, it should sleep waiting on memory, as sleeping (mutexes, memory
allocation, ...) is not considered blocking. Blocking should continue to
refer to the socket buffer-related behavior, and specifically sbwait().
However, we should fix any bugs in MSG_DONTWAIT for sosend/soreceive (and
hence sendmsg, recvmsg) that cause it to sleep improperly -- I'm not sure if
the zero-copy case still does it wrong, but that's potentially a problem if we
ever support zero-copy send from in kernel space, as sosend/soreceive can be
called while a mutex is held or in network interrupt context, hence needing
the flag.
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"