I found the actual cause and it is not related to this commit at all. Thanks Andy for the help.~
On Tue, Sep 3, 2013 at 3:19 PM, Andy Zhou <az...@nicira.com> wrote: > I will take a look. > > > On Tue, Sep 3, 2013 at 3:14 PM, Alex Wang <al...@nicira.com> wrote: > >> Hey All, >> >> This commit seems to prevent stp from working properly. >> >> I'll keep learning the code and trying to figure all out. But it would be >> helpful if anyone can help. >> >> My experiment configuration is: >> - VMA and VMB >> - create two gre tunnels (with different key) between eth3 of VMA and VMB >> (this creates the loop) >> - turn on stp using "ovs-vsctl set bridge br0 stp_enable=true" >> - ping from br0 on VMA to br0 on VMB >> >> Observation: >> - ping not work, >> - tcpdump show the packets looping between two tunnels, >> - dpctl dump-flows show incorrect flows, >> - if 'git reset' to previous commits, everything works, >> >> Kind Regards, >> Alex Wang, >> >> >> On Sat, Aug 3, 2013 at 7:05 AM, Andy Zhou <az...@nicira.com> wrote: >> >>> Thanks Ben for the late night review an improving the commit message. >>> >>> >>> On Sat, Aug 3, 2013 at 12:18 AM, Ben Pfaff <b...@nicira.com> wrote: >>> >>>> Thanks, Andy and Jesse, for the confirmation. I rewrote the commit >>>> message and applied this to master and branch-1.11 as follows. >>>> >>>> I'm really bleary-eyed tired. I hope the following is correct. >>>> >>>> Thanks, >>>> >>>> Ben. >>>> >>>> --8<--------------------------cut here-------------------------->8-- >>>> >>>> From: Andy Zhou <az...@nicira.com> >>>> Date: Fri, 2 Aug 2013 20:22:17 -0700 >>>> Subject: [PATCH] ofproto-dpif: avoid losing track of kernel flows upon >>>> reinstallation >>>> >>>> This commit fixes a problem whereby userspace can lose track of a >>>> flow installed in the kernel, instead believing that the flow is >>>> not installed. The most visible consequence of this bug was a >>>> message in the ovs-vswitchd log warning about an unexpected flow >>>> in the kernel. Other possible consequences included loss of >>>> statistics and failure to updates actions when the OpenFlow flow >>>> table changed. >>>> >>>> The problem arose in the following scenario. Suppose userspace >>>> sets up a kernel flow due to an arriving packet. Before kernel >>>> flow setup completes, another packet for that flow arrives. The >>>> kernel sends the new packet to userspace after userspace has >>>> completed processing the batch of packets that set up the flow. >>>> Userspace then attempts to reinstall the kernel flow. This fails >>>> with EEXIST, so userspace then marked the flow as not-installed, >>>> even though it was successfully installed before and remains >>>> installed. The next time userspace dumped the kernel flow >>>> table to gather statistics, it would complain about an unexpected >>>> flow and delete it. >>>> >>>> In practice, we have seen these messages with netperf TCP_CRR tests and >>>> UDP stream tests. >>>> >>>> This patch fixes the problem by changing userspace so that, once >>>> it successfully installs a flow in the kernel, it will not reinstall >>>> it when it sees another packet for the flow in userspace. This >>>> has the downside that, if something goes wrong and a flow >>>> disappears from the kernel (e.g. ovs-dpctl del-flows), then userspace >>>> won't reinstall it (until it tries to delete it). (This is in fact >>>> the reason why until now userspace reinstalled flows it knew it >>>> already installed.) >>>> >>>> Some more background may be warranted. There are two EEXIST error >>>> cases: >>>> >>>> 1. A subfacet was installed successfully in a previous (recent) >>>> batch. Now we've attempted to reinstall exactly the same >>>> subfacet in this batch. >>>> >>>> 2. A subfacet was installed successfully in a previous (recent) >>>> batch or earlier in the current batch. We've attempted to >>>> install a subfacet for an overlapping megaflow. >>>> >>>> Before megaflows, installation errors were ignored completely. >>>> Since megaflows were introduced, they have been handled by >>>> considering on any installation error that the given subfacet is >>>> not installed. This works well for case #2 but causes case #1 to >>>> yield unexpected flows, as described at the top of the commit >>>> message. >>>> >>>> This commit adds the wrinkle that we never try to reinstall >>>> exactly the same subfacet that we know we installed successfully >>>> earlier (and haven't deleted) unless its actions change. This >>>> ought to work just as well for case #2, and avoids the problem >>>> with case #1. >>>> >>>> Prepared with assistance from Ethan. >>>> >>>> Signed-off-by: Andy Zhou <az...@nicira.com> >>>> [b...@nicira.com rewrote the commit message] >>>> Signed-off-by: Ben Pfaff <b...@nicira.com> >>>> --- >>>> ofproto/ofproto-dpif.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c >>>> index 3bf4fe3..29a93e6 100644 >>>> --- a/ofproto/ofproto-dpif.c >>>> +++ b/ofproto/ofproto-dpif.c >>>> @@ -3460,7 +3460,7 @@ handle_flow_miss_with_facet(struct flow_miss >>>> *miss, struct facet *facet, >>>> subfacet_update_stats(subfacet, stats); >>>> } >>>> >>>> - if (miss->upcall_type == DPIF_UC_MISS || subfacet->path != >>>> want_path) { >>>> + if (subfacet->path != want_path) { >>>> struct flow_miss_op *op = &ops[(*n_ops)++]; >>>> struct dpif_flow_put *put = &op->dpif_op.u.flow_put; >>>> >>>> -- >>>> 1.7.10.4 >>>> >>>> >>>> >>>> On Sat, Aug 03, 2013 at 12:01:51AM -0700, Andy Zhou wrote: >>>> > Yes, this is my understanding as well. >>>> > I did not mention the implication of mega flow in the commit message >>>> > because it is not the problem. But it is important in discussing >>>> about the >>>> > solution. >>>> > >>>> > Your summary provides a better and more complete picture. >>>> > >>>> > thanks for the review. >>>> > >>>> > >>>> > On Fri, Aug 2, 2013 at 11:36 PM, Ben Pfaff <b...@nicira.com> wrote: >>>> > >>>> > > On Fri, Aug 02, 2013 at 10:27:14PM -0700, Andy Zhou wrote: >>>> > > > In the case of mega flow would this cause multiple subfacets >>>> within a >>>> > > facet >>>> > > > to be considered installed? >>>> > > > >>>> > > > >>>> > > > On Fri, Aug 2, 2013 at 10:21 PM, Ben Pfaff <b...@nicira.com> >>>> wrote: >>>> > > > >>>> > > > > On Fri, Aug 02, 2013 at 08:22:17PM -0700, Andy Zhou wrote: >>>> > > > > > This patch prevents the same subfacet from trying to install >>>> the >>>> > > kernel >>>> > > > > > flow more than once. >>>> > > > > > >>>> > > > > > This is how a kernel flow can become unexpected to the user >>>> space. >>>> > > > > > When a incoming packet did not match any flow in the kernel, >>>> it >>>> > > > > > would be sent up to the user space. A subfacet would be >>>> created and >>>> > > > > > a corresponding kenrel flow installed. So far so good. >>>> > > > > > >>>> > > > > > Just before the kernel flow was installed, another packet of >>>> the same >>>> > > > > > flow were to arrive at the kernel, it would also be queued up >>>> > > > > > for user space. This time, user space would find the >>>> subfacet just >>>> > > > > > created (subfacet_create()) due of the first packet, and >>>> attempt >>>> > > > > > to install the same kernel flow again, but could not (kernel >>>> returns >>>> > > > > EEXIST). >>>> > > > > > User space would then mark the subfacet as not installed >>>> > > > > (SF_NOT_INSTALLED). >>>> > > > > >>>> > > > > Can we just ignore EEXIST in the error handling loop at the >>>> bottom of >>>> > > > > handle_miss_upcalls()? >>>> > > > > >>>> > > >>>> > > OK, now I'm starting to see the whole picture. >>>> > > >>>> > > There are two EEXIST error cases: >>>> > > >>>> > > 1. A subfacet was installed successfully in a previous >>>> (recent) >>>> > > batch. Now we've attempted to reinstall exactly the same >>>> > > subfacet in this batch. >>>> > > >>>> > > 2. A subfacet was installed successfully in a previous >>>> (recent) >>>> > > batch or earlier in the current batch. We've attempted to >>>> > > install a subfacet for an overlapping megaflow. >>>> > > >>>> > > So far we have two ideas for handling these: >>>> > > >>>> > > a) The current way, which is to consider on any installation >>>> > > error that a given subfacet is not installed. This works >>>> well >>>> > > for case #2 but causes case #1 to yield unexpected flows, >>>> as >>>> > > described in the commit message here. >>>> > > >>>> > > b) The way proposed in this patch, which is the same as a) >>>> with >>>> > > the added wrinkle that we never try to reinstall exactly >>>> the >>>> > > same subfacet that we know we installed successfully >>>> earlier >>>> > > (and haven't deleted) unless its actions change. This >>>> ought >>>> > > to work just as well as a) for case #2, and avoids the >>>> problem >>>> > > with case #1. >>>> > > >>>> > > The (minor) flaw in b) is that if something goes wrong and a flow >>>> > > disappears from the kernel (e.g. ovs-dpctl del-flows), then >>>> userspace >>>> > > won't reinstall it (until it tries to delete it). >>>> > > >>>> > > If I understand the patch properly now, then I'm willing to commit >>>> it. >>>> > > Is that right? >>>> > > >>>> > > Thanks, >>>> > > >>>> > > Ben. >>>> > > >>>> >>> >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> http://openvswitch.org/mailman/listinfo/dev >>> >>> >> >
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev