Hey All,

This commit seems to prevent stp from working properly.

I'll keep learning the code and trying to figure all out. But it would be
helpful if anyone can help.

My experiment configuration is:
- VMA and VMB
- create two gre tunnels (with different key) between eth3 of VMA and VMB
  (this creates the loop)
- turn on stp using "ovs-vsctl set bridge br0 stp_enable=true"
- ping from br0 on VMA to br0 on VMB

Observation:
- ping not work,
- tcpdump show the packets looping between two tunnels,
- dpctl dump-flows show incorrect flows,
- if 'git reset' to previous commits, everything works,

Kind Regards,
Alex Wang,


On Sat, Aug 3, 2013 at 7:05 AM, Andy Zhou <az...@nicira.com> wrote:

> Thanks Ben for the late night review an improving the commit message.
>
>
> On Sat, Aug 3, 2013 at 12:18 AM, Ben Pfaff <b...@nicira.com> wrote:
>
>> Thanks, Andy and Jesse, for the confirmation.  I rewrote the commit
>> message and applied this to master and branch-1.11 as follows.
>>
>> I'm really bleary-eyed tired.  I hope the following is correct.
>>
>> Thanks,
>>
>> Ben.
>>
>> --8<--------------------------cut here-------------------------->8--
>>
>> From: Andy Zhou <az...@nicira.com>
>> Date: Fri, 2 Aug 2013 20:22:17 -0700
>> Subject: [PATCH] ofproto-dpif: avoid losing track of kernel flows upon
>>  reinstallation
>>
>> This commit fixes a problem whereby userspace can lose track of a
>> flow installed in the kernel, instead believing that the flow is
>> not installed.  The most visible consequence of this bug was a
>> message in the ovs-vswitchd log warning about an unexpected flow
>> in the kernel.  Other possible consequences included loss of
>> statistics and failure to updates actions when the OpenFlow flow
>> table changed.
>>
>> The problem arose in the following scenario.  Suppose userspace
>> sets up a kernel flow due to an arriving packet.  Before kernel
>> flow setup completes, another packet for that flow arrives.  The
>> kernel sends the new packet to userspace after userspace has
>> completed processing the batch of packets that set up the flow.
>> Userspace then attempts to reinstall the kernel flow.  This fails
>> with EEXIST, so userspace then marked the flow as not-installed,
>> even though it was successfully installed before and remains
>> installed.  The next time userspace dumped the kernel flow
>> table to gather statistics, it would complain about an unexpected
>> flow and delete it.
>>
>> In practice, we have seen these messages with netperf TCP_CRR tests and
>> UDP stream tests.
>>
>> This patch fixes the problem by changing userspace so that, once
>> it successfully installs a flow in the kernel, it will not reinstall
>> it when it sees another packet for the flow in userspace.  This
>> has the downside that, if something goes wrong and a flow
>> disappears from the kernel (e.g. ovs-dpctl del-flows), then userspace
>> won't reinstall it (until it tries to delete it).  (This is in fact
>> the reason why until now userspace reinstalled flows it knew it
>> already installed.)
>>
>> Some more background may be warranted.  There are two EEXIST error
>> cases:
>>
>>        1. A subfacet was installed successfully in a previous (recent)
>>           batch.  Now we've attempted to reinstall exactly the same
>>           subfacet in this batch.
>>
>>        2. A subfacet was installed successfully in a previous (recent)
>>           batch or earlier in the current batch.  We've attempted to
>>           install a subfacet for an overlapping megaflow.
>>
>> Before megaflows, installation errors were ignored completely.
>> Since megaflows were introduced, they have been handled by
>> considering on any installation error that the given subfacet is
>> not installed.  This works well for case #2 but causes case #1 to
>> yield unexpected flows, as described at the top of the commit
>> message.
>>
>> This commit adds the wrinkle that we never try to reinstall
>> exactly the same subfacet that we know we installed successfully
>> earlier (and haven't deleted) unless its actions change.  This
>> ought to work just as well for case #2, and avoids the problem
>> with case #1.
>>
>> Prepared with assistance from Ethan.
>>
>> Signed-off-by: Andy Zhou <az...@nicira.com>
>> [b...@nicira.com rewrote the commit message]
>> Signed-off-by: Ben Pfaff <b...@nicira.com>
>> ---
>>  ofproto/ofproto-dpif.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>> index 3bf4fe3..29a93e6 100644
>> --- a/ofproto/ofproto-dpif.c
>> +++ b/ofproto/ofproto-dpif.c
>> @@ -3460,7 +3460,7 @@ handle_flow_miss_with_facet(struct flow_miss *miss,
>> struct facet *facet,
>>          subfacet_update_stats(subfacet, stats);
>>      }
>>
>> -    if (miss->upcall_type == DPIF_UC_MISS || subfacet->path !=
>> want_path) {
>> +    if (subfacet->path != want_path) {
>>          struct flow_miss_op *op = &ops[(*n_ops)++];
>>          struct dpif_flow_put *put = &op->dpif_op.u.flow_put;
>>
>> --
>> 1.7.10.4
>>
>>
>>
>> On Sat, Aug 03, 2013 at 12:01:51AM -0700, Andy Zhou wrote:
>> > Yes, this is my understanding as well.
>> > I did not mention the implication of mega flow in the commit message
>> > because it is not the problem. But it is important in discussing about
>> the
>> > solution.
>> >
>> > Your summary provides a better and more complete picture.
>> >
>> > thanks for the review.
>> >
>> >
>> > On Fri, Aug 2, 2013 at 11:36 PM, Ben Pfaff <b...@nicira.com> wrote:
>> >
>> > > On Fri, Aug 02, 2013 at 10:27:14PM -0700, Andy Zhou wrote:
>> > > > In the case of mega flow would this cause multiple subfacets within
>> a
>> > > facet
>> > > > to be considered installed?
>> > > >
>> > > >
>> > > > On Fri, Aug 2, 2013 at 10:21 PM, Ben Pfaff <b...@nicira.com> wrote:
>> > > >
>> > > > > On Fri, Aug 02, 2013 at 08:22:17PM -0700, Andy Zhou wrote:
>> > > > > > This patch prevents the same subfacet from trying to install the
>> > > kernel
>> > > > > > flow more than once.
>> > > > > >
>> > > > > > This is how a kernel flow can become unexpected to the user
>> space.
>> > > > > > When a incoming packet did not match any flow in the kernel, it
>> > > > > > would be sent up to the user space. A subfacet would be created
>> and
>> > > > > > a corresponding kenrel flow installed. So far so good.
>> > > > > >
>> > > > > > Just before the kernel flow was installed, another packet of
>> the same
>> > > > > > flow were to arrive at the kernel, it would also be queued up
>> > > > > > for user space.  This time, user space would find the subfacet
>> just
>> > > > > > created (subfacet_create()) due of the first packet, and attempt
>> > > > > > to install the same kernel flow again, but could not (kernel
>> returns
>> > > > > EEXIST).
>> > > > > > User space would then mark the subfacet as not installed
>> > > > > (SF_NOT_INSTALLED).
>> > > > >
>> > > > > Can we just ignore EEXIST in the error handling loop at the
>> bottom of
>> > > > > handle_miss_upcalls()?
>> > > > >
>> > >
>> > > OK, now I'm starting to see the whole picture.
>> > >
>> > > There are two EEXIST error cases:
>> > >
>> > >        1. A subfacet was installed successfully in a previous (recent)
>> > >           batch.  Now we've attempted to reinstall exactly the same
>> > >           subfacet in this batch.
>> > >
>> > >        2. A subfacet was installed successfully in a previous (recent)
>> > >           batch or earlier in the current batch.  We've attempted to
>> > >           install a subfacet for an overlapping megaflow.
>> > >
>> > > So far we have two ideas for handling these:
>> > >
>> > >        a) The current way, which is to consider on any installation
>> > >           error that a given subfacet is not installed.  This works
>> well
>> > >           for case #2 but causes case #1 to yield unexpected flows, as
>> > >           described in the commit message here.
>> > >
>> > >        b) The way proposed in this patch, which is the same as a) with
>> > >           the added wrinkle that we never try to reinstall exactly the
>> > >           same subfacet that we know we installed successfully earlier
>> > >           (and haven't deleted) unless its actions change.  This ought
>> > >           to work just as well as a) for case #2, and avoids the
>> problem
>> > >           with case #1.
>> > >
>> > > The (minor) flaw in b) is that if something goes wrong and a flow
>> > > disappears from the kernel (e.g. ovs-dpctl del-flows), then userspace
>> > > won't reinstall it (until it tries to delete it).
>> > >
>> > > If I understand the patch properly now, then I'm willing to commit it.
>> > > Is that right?
>> > >
>> > > Thanks,
>> > >
>> > > Ben.
>> > >
>>
>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to