I found the actual cause and it is not related to this commit at all.

Thanks Andy for the help.~


On Tue, Sep 3, 2013 at 3:19 PM, Andy Zhou <az...@nicira.com> wrote:

> I will take a look.
>
>
> On Tue, Sep 3, 2013 at 3:14 PM, Alex Wang <al...@nicira.com> wrote:
>
>> Hey All,
>>
>> This commit seems to prevent stp from working properly.
>>
>> I'll keep learning the code and trying to figure all out. But it would be
>> helpful if anyone can help.
>>
>> My experiment configuration is:
>> - VMA and VMB
>> - create two gre tunnels (with different key) between eth3 of VMA and VMB
>>   (this creates the loop)
>> - turn on stp using "ovs-vsctl set bridge br0 stp_enable=true"
>> - ping from br0 on VMA to br0 on VMB
>>
>> Observation:
>> - ping not work,
>> - tcpdump show the packets looping between two tunnels,
>> - dpctl dump-flows show incorrect flows,
>> - if 'git reset' to previous commits, everything works,
>>
>> Kind Regards,
>> Alex Wang,
>>
>>
>> On Sat, Aug 3, 2013 at 7:05 AM, Andy Zhou <az...@nicira.com> wrote:
>>
>>> Thanks Ben for the late night review an improving the commit message.
>>>
>>>
>>> On Sat, Aug 3, 2013 at 12:18 AM, Ben Pfaff <b...@nicira.com> wrote:
>>>
>>>> Thanks, Andy and Jesse, for the confirmation.  I rewrote the commit
>>>> message and applied this to master and branch-1.11 as follows.
>>>>
>>>> I'm really bleary-eyed tired.  I hope the following is correct.
>>>>
>>>> Thanks,
>>>>
>>>> Ben.
>>>>
>>>> --8<--------------------------cut here-------------------------->8--
>>>>
>>>> From: Andy Zhou <az...@nicira.com>
>>>> Date: Fri, 2 Aug 2013 20:22:17 -0700
>>>> Subject: [PATCH] ofproto-dpif: avoid losing track of kernel flows upon
>>>>  reinstallation
>>>>
>>>> This commit fixes a problem whereby userspace can lose track of a
>>>> flow installed in the kernel, instead believing that the flow is
>>>> not installed.  The most visible consequence of this bug was a
>>>> message in the ovs-vswitchd log warning about an unexpected flow
>>>> in the kernel.  Other possible consequences included loss of
>>>> statistics and failure to updates actions when the OpenFlow flow
>>>> table changed.
>>>>
>>>> The problem arose in the following scenario.  Suppose userspace
>>>> sets up a kernel flow due to an arriving packet.  Before kernel
>>>> flow setup completes, another packet for that flow arrives.  The
>>>> kernel sends the new packet to userspace after userspace has
>>>> completed processing the batch of packets that set up the flow.
>>>> Userspace then attempts to reinstall the kernel flow.  This fails
>>>> with EEXIST, so userspace then marked the flow as not-installed,
>>>> even though it was successfully installed before and remains
>>>> installed.  The next time userspace dumped the kernel flow
>>>> table to gather statistics, it would complain about an unexpected
>>>> flow and delete it.
>>>>
>>>> In practice, we have seen these messages with netperf TCP_CRR tests and
>>>> UDP stream tests.
>>>>
>>>> This patch fixes the problem by changing userspace so that, once
>>>> it successfully installs a flow in the kernel, it will not reinstall
>>>> it when it sees another packet for the flow in userspace.  This
>>>> has the downside that, if something goes wrong and a flow
>>>> disappears from the kernel (e.g. ovs-dpctl del-flows), then userspace
>>>> won't reinstall it (until it tries to delete it).  (This is in fact
>>>> the reason why until now userspace reinstalled flows it knew it
>>>> already installed.)
>>>>
>>>> Some more background may be warranted.  There are two EEXIST error
>>>> cases:
>>>>
>>>>        1. A subfacet was installed successfully in a previous (recent)
>>>>           batch.  Now we've attempted to reinstall exactly the same
>>>>           subfacet in this batch.
>>>>
>>>>        2. A subfacet was installed successfully in a previous (recent)
>>>>           batch or earlier in the current batch.  We've attempted to
>>>>           install a subfacet for an overlapping megaflow.
>>>>
>>>> Before megaflows, installation errors were ignored completely.
>>>> Since megaflows were introduced, they have been handled by
>>>> considering on any installation error that the given subfacet is
>>>> not installed.  This works well for case #2 but causes case #1 to
>>>> yield unexpected flows, as described at the top of the commit
>>>> message.
>>>>
>>>> This commit adds the wrinkle that we never try to reinstall
>>>> exactly the same subfacet that we know we installed successfully
>>>> earlier (and haven't deleted) unless its actions change.  This
>>>> ought to work just as well for case #2, and avoids the problem
>>>> with case #1.
>>>>
>>>> Prepared with assistance from Ethan.
>>>>
>>>> Signed-off-by: Andy Zhou <az...@nicira.com>
>>>> [b...@nicira.com rewrote the commit message]
>>>> Signed-off-by: Ben Pfaff <b...@nicira.com>
>>>> ---
>>>>  ofproto/ofproto-dpif.c |    2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>>>> index 3bf4fe3..29a93e6 100644
>>>> --- a/ofproto/ofproto-dpif.c
>>>> +++ b/ofproto/ofproto-dpif.c
>>>> @@ -3460,7 +3460,7 @@ handle_flow_miss_with_facet(struct flow_miss
>>>> *miss, struct facet *facet,
>>>>          subfacet_update_stats(subfacet, stats);
>>>>      }
>>>>
>>>> -    if (miss->upcall_type == DPIF_UC_MISS || subfacet->path !=
>>>> want_path) {
>>>> +    if (subfacet->path != want_path) {
>>>>          struct flow_miss_op *op = &ops[(*n_ops)++];
>>>>          struct dpif_flow_put *put = &op->dpif_op.u.flow_put;
>>>>
>>>> --
>>>> 1.7.10.4
>>>>
>>>>
>>>>
>>>> On Sat, Aug 03, 2013 at 12:01:51AM -0700, Andy Zhou wrote:
>>>> > Yes, this is my understanding as well.
>>>> > I did not mention the implication of mega flow in the commit message
>>>> > because it is not the problem. But it is important in discussing
>>>> about the
>>>> > solution.
>>>> >
>>>> > Your summary provides a better and more complete picture.
>>>> >
>>>> > thanks for the review.
>>>> >
>>>> >
>>>> > On Fri, Aug 2, 2013 at 11:36 PM, Ben Pfaff <b...@nicira.com> wrote:
>>>> >
>>>> > > On Fri, Aug 02, 2013 at 10:27:14PM -0700, Andy Zhou wrote:
>>>> > > > In the case of mega flow would this cause multiple subfacets
>>>> within a
>>>> > > facet
>>>> > > > to be considered installed?
>>>> > > >
>>>> > > >
>>>> > > > On Fri, Aug 2, 2013 at 10:21 PM, Ben Pfaff <b...@nicira.com>
>>>> wrote:
>>>> > > >
>>>> > > > > On Fri, Aug 02, 2013 at 08:22:17PM -0700, Andy Zhou wrote:
>>>> > > > > > This patch prevents the same subfacet from trying to install
>>>> the
>>>> > > kernel
>>>> > > > > > flow more than once.
>>>> > > > > >
>>>> > > > > > This is how a kernel flow can become unexpected to the user
>>>> space.
>>>> > > > > > When a incoming packet did not match any flow in the kernel,
>>>> it
>>>> > > > > > would be sent up to the user space. A subfacet would be
>>>> created and
>>>> > > > > > a corresponding kenrel flow installed. So far so good.
>>>> > > > > >
>>>> > > > > > Just before the kernel flow was installed, another packet of
>>>> the same
>>>> > > > > > flow were to arrive at the kernel, it would also be queued up
>>>> > > > > > for user space.  This time, user space would find the
>>>> subfacet just
>>>> > > > > > created (subfacet_create()) due of the first packet, and
>>>> attempt
>>>> > > > > > to install the same kernel flow again, but could not (kernel
>>>> returns
>>>> > > > > EEXIST).
>>>> > > > > > User space would then mark the subfacet as not installed
>>>> > > > > (SF_NOT_INSTALLED).
>>>> > > > >
>>>> > > > > Can we just ignore EEXIST in the error handling loop at the
>>>> bottom of
>>>> > > > > handle_miss_upcalls()?
>>>> > > > >
>>>> > >
>>>> > > OK, now I'm starting to see the whole picture.
>>>> > >
>>>> > > There are two EEXIST error cases:
>>>> > >
>>>> > >        1. A subfacet was installed successfully in a previous
>>>> (recent)
>>>> > >           batch.  Now we've attempted to reinstall exactly the same
>>>> > >           subfacet in this batch.
>>>> > >
>>>> > >        2. A subfacet was installed successfully in a previous
>>>> (recent)
>>>> > >           batch or earlier in the current batch.  We've attempted to
>>>> > >           install a subfacet for an overlapping megaflow.
>>>> > >
>>>> > > So far we have two ideas for handling these:
>>>> > >
>>>> > >        a) The current way, which is to consider on any installation
>>>> > >           error that a given subfacet is not installed.  This works
>>>> well
>>>> > >           for case #2 but causes case #1 to yield unexpected flows,
>>>> as
>>>> > >           described in the commit message here.
>>>> > >
>>>> > >        b) The way proposed in this patch, which is the same as a)
>>>> with
>>>> > >           the added wrinkle that we never try to reinstall exactly
>>>> the
>>>> > >           same subfacet that we know we installed successfully
>>>> earlier
>>>> > >           (and haven't deleted) unless its actions change.  This
>>>> ought
>>>> > >           to work just as well as a) for case #2, and avoids the
>>>> problem
>>>> > >           with case #1.
>>>> > >
>>>> > > The (minor) flaw in b) is that if something goes wrong and a flow
>>>> > > disappears from the kernel (e.g. ovs-dpctl del-flows), then
>>>> userspace
>>>> > > won't reinstall it (until it tries to delete it).
>>>> > >
>>>> > > If I understand the patch properly now, then I'm willing to commit
>>>> it.
>>>> > > Is that right?
>>>> > >
>>>> > > Thanks,
>>>> > >
>>>> > > Ben.
>>>> > >
>>>>
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
>>>
>>>
>>
>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to