Thanks, Andy and Jesse, for the confirmation. I rewrote the commit message and applied this to master and branch-1.11 as follows.
I'm really bleary-eyed tired. I hope the following is correct. Thanks, Ben. --8<--------------------------cut here-------------------------->8-- From: Andy Zhou <az...@nicira.com> Date: Fri, 2 Aug 2013 20:22:17 -0700 Subject: [PATCH] ofproto-dpif: avoid losing track of kernel flows upon reinstallation This commit fixes a problem whereby userspace can lose track of a flow installed in the kernel, instead believing that the flow is not installed. The most visible consequence of this bug was a message in the ovs-vswitchd log warning about an unexpected flow in the kernel. Other possible consequences included loss of statistics and failure to updates actions when the OpenFlow flow table changed. The problem arose in the following scenario. Suppose userspace sets up a kernel flow due to an arriving packet. Before kernel flow setup completes, another packet for that flow arrives. The kernel sends the new packet to userspace after userspace has completed processing the batch of packets that set up the flow. Userspace then attempts to reinstall the kernel flow. This fails with EEXIST, so userspace then marked the flow as not-installed, even though it was successfully installed before and remains installed. The next time userspace dumped the kernel flow table to gather statistics, it would complain about an unexpected flow and delete it. In practice, we have seen these messages with netperf TCP_CRR tests and UDP stream tests. This patch fixes the problem by changing userspace so that, once it successfully installs a flow in the kernel, it will not reinstall it when it sees another packet for the flow in userspace. This has the downside that, if something goes wrong and a flow disappears from the kernel (e.g. ovs-dpctl del-flows), then userspace won't reinstall it (until it tries to delete it). (This is in fact the reason why until now userspace reinstalled flows it knew it already installed.) Some more background may be warranted. There are two EEXIST error cases: 1. A subfacet was installed successfully in a previous (recent) batch. Now we've attempted to reinstall exactly the same subfacet in this batch. 2. A subfacet was installed successfully in a previous (recent) batch or earlier in the current batch. We've attempted to install a subfacet for an overlapping megaflow. Before megaflows, installation errors were ignored completely. Since megaflows were introduced, they have been handled by considering on any installation error that the given subfacet is not installed. This works well for case #2 but causes case #1 to yield unexpected flows, as described at the top of the commit message. This commit adds the wrinkle that we never try to reinstall exactly the same subfacet that we know we installed successfully earlier (and haven't deleted) unless its actions change. This ought to work just as well for case #2, and avoids the problem with case #1. Prepared with assistance from Ethan. Signed-off-by: Andy Zhou <az...@nicira.com> [b...@nicira.com rewrote the commit message] Signed-off-by: Ben Pfaff <b...@nicira.com> --- ofproto/ofproto-dpif.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 3bf4fe3..29a93e6 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -3460,7 +3460,7 @@ handle_flow_miss_with_facet(struct flow_miss *miss, struct facet *facet, subfacet_update_stats(subfacet, stats); } - if (miss->upcall_type == DPIF_UC_MISS || subfacet->path != want_path) { + if (subfacet->path != want_path) { struct flow_miss_op *op = &ops[(*n_ops)++]; struct dpif_flow_put *put = &op->dpif_op.u.flow_put; -- 1.7.10.4 On Sat, Aug 03, 2013 at 12:01:51AM -0700, Andy Zhou wrote: > Yes, this is my understanding as well. > I did not mention the implication of mega flow in the commit message > because it is not the problem. But it is important in discussing about the > solution. > > Your summary provides a better and more complete picture. > > thanks for the review. > > > On Fri, Aug 2, 2013 at 11:36 PM, Ben Pfaff <b...@nicira.com> wrote: > > > On Fri, Aug 02, 2013 at 10:27:14PM -0700, Andy Zhou wrote: > > > In the case of mega flow would this cause multiple subfacets within a > > facet > > > to be considered installed? > > > > > > > > > On Fri, Aug 2, 2013 at 10:21 PM, Ben Pfaff <b...@nicira.com> wrote: > > > > > > > On Fri, Aug 02, 2013 at 08:22:17PM -0700, Andy Zhou wrote: > > > > > This patch prevents the same subfacet from trying to install the > > kernel > > > > > flow more than once. > > > > > > > > > > This is how a kernel flow can become unexpected to the user space. > > > > > When a incoming packet did not match any flow in the kernel, it > > > > > would be sent up to the user space. A subfacet would be created and > > > > > a corresponding kenrel flow installed. So far so good. > > > > > > > > > > Just before the kernel flow was installed, another packet of the same > > > > > flow were to arrive at the kernel, it would also be queued up > > > > > for user space. This time, user space would find the subfacet just > > > > > created (subfacet_create()) due of the first packet, and attempt > > > > > to install the same kernel flow again, but could not (kernel returns > > > > EEXIST). > > > > > User space would then mark the subfacet as not installed > > > > (SF_NOT_INSTALLED). > > > > > > > > Can we just ignore EEXIST in the error handling loop at the bottom of > > > > handle_miss_upcalls()? > > > > > > > > OK, now I'm starting to see the whole picture. > > > > There are two EEXIST error cases: > > > > 1. A subfacet was installed successfully in a previous (recent) > > batch. Now we've attempted to reinstall exactly the same > > subfacet in this batch. > > > > 2. A subfacet was installed successfully in a previous (recent) > > batch or earlier in the current batch. We've attempted to > > install a subfacet for an overlapping megaflow. > > > > So far we have two ideas for handling these: > > > > a) The current way, which is to consider on any installation > > error that a given subfacet is not installed. This works well > > for case #2 but causes case #1 to yield unexpected flows, as > > described in the commit message here. > > > > b) The way proposed in this patch, which is the same as a) with > > the added wrinkle that we never try to reinstall exactly the > > same subfacet that we know we installed successfully earlier > > (and haven't deleted) unless its actions change. This ought > > to work just as well as a) for case #2, and avoids the problem > > with case #1. > > > > The (minor) flaw in b) is that if something goes wrong and a flow > > disappears from the kernel (e.g. ovs-dpctl del-flows), then userspace > > won't reinstall it (until it tries to delete it). > > > > If I understand the patch properly now, then I'm willing to commit it. > > Is that right? > > > > Thanks, > > > > Ben. > > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev