On Tue, Jul 15, 2014 at 04:52:21PM +1200, Joe Stringer wrote:
> On 15 July 2014 09:10, Ben Pfaff <[email protected]> wrote:
>
> > Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread
> > races.) introduced a simple workaround for Linux kernel races in Netlink
> > dumps. However, the code remained more complicated than needed. This
> > commit simplifies it.
> >
> > The main reason for complication in the code was 'status_seq' in nl_dump.
> > This member was there to allow a thread to wait for some other thread to
> > refill the socket buffer with another dump message (although we did not
> > understand the reason at the time it was introduced). Now that we know
> > that Netlink dumps properly need to be serialized to work in existing
> > Linux kernels, there's no additional value in having 'status_seq',
> > because serialized recvmsg() calls always refill the socket buffer
> > properly.
> >
> > This commit updates nl_msg_next() to clear its buffer argument on error.
> > This is a more convenient interface for the new version of the Netlink
> > dump code. nl_msg_next() doesn't have any other callers.
> >
> > Signed-off-by: Ben Pfaff <[email protected]>
> >
>
> I welcome a cleanup of this, it's a bit unwieldy as-is.
>
>
> (snip)
> > +
> > +static int
> > +nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer)
> >
>
> Minor nitpick: Are the underscores necessary in this function name?
No. Removed.
> > + OVS_REQUIRES(dump->mutex)
> > +{
> > + struct nlmsghdr *nlmsghdr;
> > + int error;
> > +
> > + while (!ofpbuf_size(buffer)) {
> > + error = nl_sock_recv__(dump->sock, buffer, true);
> > + if (error) {
> > + return error == EAGAIN ? EOF : 0;
> > + }
> >
>
> Is this meant to differ considerably from the current behaviour?
>
> Two points:
> * Current master, if EAGAIN is returned, will block on the socket as well
> as on the seq, so if there's any new data or another thread sees the
> final/error message, the thread will wake up. Here, we interpret it as EOF,
> even if we don't see the netlink message with NLMSG_DONE set. (I don't know
> if we might see EAGAIN from the kernel in normal circumstances now, but I
> vaguely recall seeing this when multiple threads were dumping at once).
The kernel shouldn't ever return EAGAIN if there's more data to dump.
When dumping is single-threaded (as in branch-1.11) or serialized (as
post-commit 0791315e4d), the kernel implements that correctly. When
dumping is multithreaded, the kernel has races that screw it up, and
each thread can get EAGAIN if it loses a race. (I tried to explain
this in the commit message for commit 0791315e4d.) I'm pretty sure
that this is OK because that's how OVS worked throughout the whole OVS
1.x series.
> * Current master, if a different error code is returned, will store the
> error code and stop dumping. This code appears to fall through and continue
> reading from the buffer. Maybe the '0' is meant to be 'error'?
Oops. Yes.
I think that the other return could just pass along EAGAIN instead of
converting it to EPROTO.
> > (snip)
> > + /* Fetch the next message from the buffer. */
> > + if (!retval) {
> > + retval = nl_dump_next__(reply, buffer);
> > if (retval) {
> > - ofpbuf_clear(buffer);
> > - if (retval == EAGAIN) {
> > - nl_sock_wait(dump->sock, POLLIN);
> > - seq_wait(dump->status_seq, seq);
> > - poll_block();
> > - continue;
> > - } else {
> > - error = retval;
> > - goto exit;
> > + /* Record 'retval' as the dump status, but don't overwrite an
> > error
> > + * with EOF. */
> > + ovs_mutex_lock(&dump->mutex);
> > + if (dump->status <= 0) {
> > + dump->status = retval;
> > }
> >
>
> Is this assuming that we store -1 in dump->status to indicate EOF? I don't
> see where that's done.
EOF has a negative value (which is guaranteed by ANSI C and by POSIX),
normally -1 (but that part isn't guaranteed).
Here's an incremental, and I'll repost the whole thing:
diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 5578ce6..8f4ff9a 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -717,7 +717,7 @@ nl_dump_start(struct nl_dump *dump, int protocol, const
struct ofpbuf *request)
}
static int
-nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer)
+nl_dump_refill(struct nl_dump *dump, struct ofpbuf *buffer)
OVS_REQUIRES(dump->mutex)
{
struct nlmsghdr *nlmsghdr;
@@ -726,7 +726,7 @@ nl_dump_refill__(struct nl_dump *dump, struct ofpbuf
*buffer)
while (!ofpbuf_size(buffer)) {
error = nl_sock_recv__(dump->sock, buffer, true);
if (error) {
- return error == EAGAIN ? EOF : 0;
+ return error == EAGAIN ? EOF : error;
}
nlmsghdr = nl_msg_nlmsghdr(buffer);
@@ -741,7 +741,7 @@ nl_dump_refill__(struct nl_dump *dump, struct ofpbuf
*buffer)
VLOG_INFO_RL(&rl, "netlink dump request error (%s)",
ovs_strerror(error));
ofpbuf_clear(buffer);
- return error == EAGAIN ? EPROTO : error;
+ return error;
}
return 0;
@@ -800,7 +800,7 @@ nl_dump_next(struct nl_dump *dump, struct ofpbuf *reply,
struct ofpbuf *buffer)
* next recv on that socket, which could be anywhere due to the way
* that we pool Netlink sockets. Serializing the recv calls avoids
* the issue. */
- dump->status = nl_dump_refill__(dump, buffer);
+ dump->status = nl_dump_refill(dump, buffer);
}
retval = dump->status;
ovs_mutex_unlock(&dump->mutex);
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev