On Tue, Jul 15, 2014 at 04:52:21PM +1200, Joe Stringer wrote: > On 15 July 2014 09:10, Ben Pfaff <b...@nicira.com> wrote: > > > Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread > > races.) introduced a simple workaround for Linux kernel races in Netlink > > dumps. However, the code remained more complicated than needed. This > > commit simplifies it. > > > > The main reason for complication in the code was 'status_seq' in nl_dump. > > This member was there to allow a thread to wait for some other thread to > > refill the socket buffer with another dump message (although we did not > > understand the reason at the time it was introduced). Now that we know > > that Netlink dumps properly need to be serialized to work in existing > > Linux kernels, there's no additional value in having 'status_seq', > > because serialized recvmsg() calls always refill the socket buffer > > properly. > > > > This commit updates nl_msg_next() to clear its buffer argument on error. > > This is a more convenient interface for the new version of the Netlink > > dump code. nl_msg_next() doesn't have any other callers. > > > > Signed-off-by: Ben Pfaff <b...@nicira.com> > > > > I welcome a cleanup of this, it's a bit unwieldy as-is. > > > (snip) > > + > > +static int > > +nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer) > > > > Minor nitpick: Are the underscores necessary in this function name?
No. Removed. > > + OVS_REQUIRES(dump->mutex) > > +{ > > + struct nlmsghdr *nlmsghdr; > > + int error; > > + > > + while (!ofpbuf_size(buffer)) { > > + error = nl_sock_recv__(dump->sock, buffer, true); > > + if (error) { > > + return error == EAGAIN ? EOF : 0; > > + } > > > > Is this meant to differ considerably from the current behaviour? > > Two points: > * Current master, if EAGAIN is returned, will block on the socket as well > as on the seq, so if there's any new data or another thread sees the > final/error message, the thread will wake up. Here, we interpret it as EOF, > even if we don't see the netlink message with NLMSG_DONE set. (I don't know > if we might see EAGAIN from the kernel in normal circumstances now, but I > vaguely recall seeing this when multiple threads were dumping at once). The kernel shouldn't ever return EAGAIN if there's more data to dump. When dumping is single-threaded (as in branch-1.11) or serialized (as post-commit 0791315e4d), the kernel implements that correctly. When dumping is multithreaded, the kernel has races that screw it up, and each thread can get EAGAIN if it loses a race. (I tried to explain this in the commit message for commit 0791315e4d.) I'm pretty sure that this is OK because that's how OVS worked throughout the whole OVS 1.x series. > * Current master, if a different error code is returned, will store the > error code and stop dumping. This code appears to fall through and continue > reading from the buffer. Maybe the '0' is meant to be 'error'? Oops. Yes. I think that the other return could just pass along EAGAIN instead of converting it to EPROTO. > > (snip) > > + /* Fetch the next message from the buffer. */ > > + if (!retval) { > > + retval = nl_dump_next__(reply, buffer); > > if (retval) { > > - ofpbuf_clear(buffer); > > - if (retval == EAGAIN) { > > - nl_sock_wait(dump->sock, POLLIN); > > - seq_wait(dump->status_seq, seq); > > - poll_block(); > > - continue; > > - } else { > > - error = retval; > > - goto exit; > > + /* Record 'retval' as the dump status, but don't overwrite an > > error > > + * with EOF. */ > > + ovs_mutex_lock(&dump->mutex); > > + if (dump->status <= 0) { > > + dump->status = retval; > > } > > > > Is this assuming that we store -1 in dump->status to indicate EOF? I don't > see where that's done. EOF has a negative value (which is guaranteed by ANSI C and by POSIX), normally -1 (but that part isn't guaranteed). Here's an incremental, and I'll repost the whole thing: diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index 5578ce6..8f4ff9a 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -717,7 +717,7 @@ nl_dump_start(struct nl_dump *dump, int protocol, const struct ofpbuf *request) } static int -nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer) +nl_dump_refill(struct nl_dump *dump, struct ofpbuf *buffer) OVS_REQUIRES(dump->mutex) { struct nlmsghdr *nlmsghdr; @@ -726,7 +726,7 @@ nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer) while (!ofpbuf_size(buffer)) { error = nl_sock_recv__(dump->sock, buffer, true); if (error) { - return error == EAGAIN ? EOF : 0; + return error == EAGAIN ? EOF : error; } nlmsghdr = nl_msg_nlmsghdr(buffer); @@ -741,7 +741,7 @@ nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer) VLOG_INFO_RL(&rl, "netlink dump request error (%s)", ovs_strerror(error)); ofpbuf_clear(buffer); - return error == EAGAIN ? EPROTO : error; + return error; } return 0; @@ -800,7 +800,7 @@ nl_dump_next(struct nl_dump *dump, struct ofpbuf *reply, struct ofpbuf *buffer) * next recv on that socket, which could be anywhere due to the way * that we pool Netlink sockets. Serializing the recv calls avoids * the issue. */ - dump->status = nl_dump_refill__(dump, buffer); + dump->status = nl_dump_refill(dump, buffer); } retval = dump->status; ovs_mutex_unlock(&dump->mutex); _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev