On Thu, Sep 22, 2011 at 10:38 AM, Ben Pfaff <b...@nicira.com> wrote:
> On Wed, Sep 21, 2011 at 11:02:03PM -0700, Jesse Gross wrote:
>> On Wed, Sep 21, 2011 at 9:37 PM, Ben Pfaff <b...@nicira.com> wrote:
>> > On Mon, Sep 19, 2011 at 03:00:04PM -0700, Jesse Gross wrote:
>> >> In the future, the kernel will use unicast messages instead of
>> >> multicast to send upcalls. ??As a result, we need to be able to
>> >> tell it where to direct the traffic. ??This adds a function to expose
>> >> the Netlink pid of a socket so it can be included in messages to the
>> >> kernel.
>> >
>> > The Netlink socket library in OVS has logic to keep fds used for Netlink
>> > dump operations separate from fds used for multicast reception, because
>> > we don't want reading dump output to discard an arbitrary number of
>> > possibly interleaved multicast messages.
>> >
>> > Even though it's not technically multicast, this new way of receiving
>> > upcalls is still asynchronous and could still have the same issue as a
>> > socket subscribed to a multicast group.
>>
>> That's a good point although you'll also have similar issues if you
>> execute transactions on the same socket so it's still necessary to
>> segregate operations on a single socket (as is done here).  We could
>> try to deal with that too although I'm not sure it's worth it and all
>> these games with swapping FDs behind the scenes makes me somewhat
>> nervous.
>
> The biggest reason for the fd-swapping games is to overcome an awkward
> limitation of Netlink dump operations, which is that you can only have
> one of them going on at a time on a single Netlink socket.  So either
> the Netlink clients like dpif-linux have to deal with this themselves,
> by keeping track of when a dump is going on and opening new sockets
> when another one starts, or the netlink library has to do it.  I think
> it's less error-prone to have the logic in just one place, instead of
> spreading it out among all the clients.
>
> Note that dpif-linux doesn't read all of a dump's output in one go.
> It exposes an iterator interface to its clients and reads additional
> Netlink packets as the client consumes the existing ones.  It would
> waste a lot of memory to consume the whole flow dump at once.
>
> This problem is admittedly a bit of a corner case.  It's not too
> common to want to do two dumps at the same time.  (In my opinion,
> that's another good reason to centralize the logic; if it's
> distributed and redundant across clients that rarely encounter the
> problem, then if they ever do we'll be running untested code.)
>
> The next biggest reason is much less of a corner case: if there's an
> ongoing dump operation on a netlink fd, then simple request-reply
> transactions on the same fd will screw up the dump, by skipping past
> some (perhaps all) of the dump replies.  We could store those dump
> replies, but that requires an unbounded amount of memory.  Using a
> separate fd avoids the problem.
>
> Keeping multicast traffic separate from other traffic is way down the
> importance scale, since it's easy for clients to segregate the
> nl_socks they're using for transactions and dumps from the nl_socks
> they're using for receiving multicast traffic.  In fact, it looks like
> all the existing clients in OVS already do that.  I think that it is
> necessary for correctness, so I guess that we should simply declare
> that a socket that receives asynchronous messages must not be used for
> transactions or dumps.  Here's a patch that does that.  After this is
> applied, I agree that your original patch is fine.

Yeah, I understand the rationale for the FD-swapping and I think it
makes sense in the context of transactions and dumps being intermixed.
 The specific part that made me nervous is usage with multicast or
unicast upcalls where it's important to have the right socket.  And
since existing users already segregate asynchronous usage it's not
really worth the extra complexity to fully handle.  Your patch below
looks good to me.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to