On Thu, Dec 15, 2011 at 03:24:38PM -0800, Jesse Gross wrote: > On Thu, Dec 15, 2011 at 1:34 PM, Ben Pfaff <b...@nicira.com> wrote: > > On Thu, Dec 15, 2011 at 01:00:00PM -0800, Jesse Gross wrote: > >> On Thu, Dec 15, 2011 at 12:34 PM, Ben Pfaff <b...@nicira.com> wrote: > >> > A workaround would be to call synchronize_rcu() and send the genl > >> > reply from some context that doesn't hold genl_lock, but that's hardly > >> > elegant. ??Also it means that the real reply would come after the one > >> > generated automatically by af_netlink.c when NLM_F_ACK is used, which > >> > violates the normal rules. > >> > >> Isn't that almost exactly the same as sending the message from the RCU > >> callback (including the Netlink ordering issue)? > > > > In the "send from RCU callback" case, I intended that the normal > > Netlink reply would be sent synchronously just after deletion, just as > > it is now. ??The ordering would therefore stay just as it is now. ??Only > > the broadcast reply would be deferred. > > What do you consider the "normal Netlink reply"? There are basically > 3 different pieces: > > * Multicast flow information to the relevant group (excluding the > requester if NLM_F_ECHO is specified). > * Unicast flow information to the requester if NLM_F_ECHO is specified. > * Unicast Netlink acknowledgement if NLM_F_ACK is specified or there > was an error. > > Are you talking about doing the last two as-is and moving the > multicast to the flow reclamation (but actually sending it to everyone > at that time)?
Yes, that's roughly what I had in mind, and perhaps exactly, depending on what "but actually sending it everyone" means. The idea is that the immediate, synchronous reply for NLM_F_ECHO would give approximate statistics, the later multicast would give exact statistics, and receiving the multicast could be used as an indicator that any upcalls generated by the flow have already been queued to userspace. I assumed one socket would be used for sending the request and receiving the unicast reply and another socket would be used for receiving the multicasts, since we'd want to get both separately. > > The "use synchronize_rcu() on the side then send the reply" case would > > change the message ordering. > > I guess I don't see why you couldn't use exactly the same strategy as > you choose above. I assumed that an intended advantage of using synchronize_rcu() was that all three forms > > Could we just use the existing spinlock in sw_flow plus the 'dead' > > variable to ensure that no packets go through a deleted flow after > > it's deleted? ??On the delete side, take the spinlock and set 'dead'. > > On the fast path, take the spinlock and check 'dead' before executing > > the actions, release the spinlock after executing the actions. ??We > > already have to take the spinlock for every packet anyway to update > > the stats, so it's not an additional cost. ??I doubt that there's any > > parallel work for a given flow anyway (that would imply packet > > reordering). ??We would have to avoid recursive locking somehow. > > Hmm, I think that would probably work. It's slightly convoluted but > certainly much less so than the alternatives. How would you get > recursive locking? Flow loops e.g. through patch ports was what came to mind. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev