Re: [ovs-dev] [PATCH 4/4] netlink-socket: Simplify multithreaded dumping to match Linux reality.

Ben Pfaff Tue, 15 Jul 2014 10:56:26 -0700

On Tue, Jul 15, 2014 at 04:52:21PM +1200, Joe Stringer wrote:
> On 15 July 2014 09:10, Ben Pfaff <b...@nicira.com> wrote:
> 
> > Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread
> > races.) introduced a simple workaround for Linux kernel races in Netlink
> > dumps.  However, the code remained more complicated than needed.  This
> > commit simplifies it.
> >
> > The main reason for complication in the code was 'status_seq' in nl_dump.
> > This member was there to allow a thread to wait for some other thread to
> > refill the socket buffer with another dump message (although we did not
> > understand the reason at the time it was introduced).  Now that we know
> > that Netlink dumps properly need to be serialized to work in existing
> > Linux kernels, there's no additional value in having 'status_seq',
> > because serialized recvmsg() calls always refill the socket buffer
> > properly.
> >
> > This commit updates nl_msg_next() to clear its buffer argument on error.
> > This is a more convenient interface for the new version of the Netlink
> > dump code.  nl_msg_next() doesn't have any other callers.
> >
> > Signed-off-by: Ben Pfaff <b...@nicira.com>
> >
> 
> I welcome a cleanup of this, it's a bit unwieldy as-is.
> 
> 
> (snip)
> > +
> > +static int
> > +nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer)
> >
> 
> Minor nitpick: Are the underscores necessary in this function name?


No.  Removed.

> > +    OVS_REQUIRES(dump->mutex)
> > +{
> > +    struct nlmsghdr *nlmsghdr;
> > +    int error;
> > +
> > +    while (!ofpbuf_size(buffer)) {
> > +        error = nl_sock_recv__(dump->sock, buffer, true);
> > +        if (error) {
> > +            return error == EAGAIN ? EOF : 0;
> > +        }
> >
> 
> Is this meant to differ considerably from the current behaviour?
> 
> Two points:
> * Current master, if EAGAIN is returned, will block on the socket as well
> as on the seq, so if there's any new data or another thread sees the
> final/error message, the thread will wake up. Here, we interpret it as EOF,
> even if we don't see the netlink message with NLMSG_DONE set. (I don't know
> if we might see EAGAIN from the kernel in normal circumstances now, but I
> vaguely recall seeing this when multiple threads were dumping at once).

The kernel shouldn't ever return EAGAIN if there's more data to dump.
When dumping is single-threaded (as in branch-1.11) or serialized (as
post-commit 0791315e4d), the kernel implements that correctly.  When
dumping is multithreaded, the kernel has races that screw it up, and
each thread can get EAGAIN if it loses a race.  (I tried to explain
this in the commit message for commit 0791315e4d.)  I'm pretty sure
that this is OK because that's how OVS worked throughout the whole OVS
1.x series.

> * Current master, if a different error code is returned, will store the
> error code and stop dumping. This code appears to fall through and continue
> reading from the buffer. Maybe the '0' is meant to be 'error'?

Oops.  Yes.

I think that the other return could just pass along EAGAIN instead of
converting it to EPROTO.


> > (snip)
> > +    /* Fetch the next message from the buffer. */
> > +    if (!retval) {
> > +        retval = nl_dump_next__(reply, buffer);
> >          if (retval) {
> > -            ofpbuf_clear(buffer);
> > -            if (retval == EAGAIN) {
> > -                nl_sock_wait(dump->sock, POLLIN);
> > -                seq_wait(dump->status_seq, seq);
> > -                poll_block();
> > -                continue;
> > -            } else {
> > -                error = retval;
> > -                goto exit;
> > +            /* Record 'retval' as the dump status, but don't overwrite an
> > error
> > +             * with EOF.  */
> > +            ovs_mutex_lock(&dump->mutex);
> > +            if (dump->status <= 0) {
> > +                dump->status = retval;
> >              }
> >
> 
> Is this assuming that we store -1 in dump->status to indicate EOF? I don't
> see where that's done.

EOF has a negative value (which is guaranteed by ANSI C and by POSIX),
normally -1 (but that part isn't guaranteed).

Here's an incremental, and I'll repost the whole thing:

diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 5578ce6..8f4ff9a 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -717,7 +717,7 @@ nl_dump_start(struct nl_dump *dump, int protocol, const 
struct ofpbuf *request)
 }
 
 static int
-nl_dump_refill__(struct nl_dump *dump, struct ofpbuf *buffer)
+nl_dump_refill(struct nl_dump *dump, struct ofpbuf *buffer)
     OVS_REQUIRES(dump->mutex)
 {
     struct nlmsghdr *nlmsghdr;
@@ -726,7 +726,7 @@ nl_dump_refill__(struct nl_dump *dump, struct ofpbuf 
*buffer)
     while (!ofpbuf_size(buffer)) {
         error = nl_sock_recv__(dump->sock, buffer, true);
         if (error) {
-            return error == EAGAIN ? EOF : 0;
+            return error == EAGAIN ? EOF : error;
         }
 
         nlmsghdr = nl_msg_nlmsghdr(buffer);
@@ -741,7 +741,7 @@ nl_dump_refill__(struct nl_dump *dump, struct ofpbuf 
*buffer)
         VLOG_INFO_RL(&rl, "netlink dump request error (%s)",
                      ovs_strerror(error));
         ofpbuf_clear(buffer);
-        return error == EAGAIN ? EPROTO : error;
+        return error;
     }
 
     return 0;
@@ -800,7 +800,7 @@ nl_dump_next(struct nl_dump *dump, struct ofpbuf *reply, 
struct ofpbuf *buffer)
              * next recv on that socket, which could be anywhere due to the way
              * that we pool Netlink sockets.  Serializing the recv calls avoids
              * the issue. */
-            dump->status = nl_dump_refill__(dump, buffer);
+            dump->status = nl_dump_refill(dump, buffer);
         }
         retval = dump->status;
         ovs_mutex_unlock(&dump->mutex);
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 4/4] netlink-socket: Simplify multithreaded dumping to match Linux reality.

Reply via email to