----- Original Message ----- > From: "Ryan Moats" <rmo...@us.ibm.com> > To: "Lance Richardson" <lrich...@redhat.com> > Cc: dev@openvswitch.org > Sent: Tuesday, July 5, 2016 12:17:17 PM > Subject: Re: [ovs-dev] [PATCH] ovn-controller: eliminate stall in ofctrl > state machine > > "dev" <dev-boun...@openvswitch.org> wrote on 07/05/2016 07:58:24 AM: > > > From: Lance Richardson <lrich...@redhat.com> > > To: dev@openvswitch.org > > Date: 07/05/2016 07:58 AM > > Subject: [ovs-dev] [PATCH] ovn-controller: eliminate stall in ofctrl > > state machine > > Sent by: "dev" <dev-boun...@openvswitch.org> > > > > The "ovn -- 2 HVs, 3 LRs connected via LS, static routes" > > test case currently exhibits frequent failures. These failures > > occur because, at the time that the test packets are sent to > > verify forwarding, no flows have been installed in the vswitch > > for one of the hypervisors. > > > > Investigation shows that, in the failing case, the ofctrl state > > machine has not yet transitioned to the S_UPDATE_FLOWS state. > > This occurrs when ofctrl_run() is called and: > > 1) The state is S_TLV_TABLE_MOD_SENT. > > 2) An OFPTYPE_NXT_TLV_TABLE_REPLY message is queued for reception. > > 3) No event (other than SB probe timer expiration) is expected > > that would unblock poll_block() in the main ovn-controller > > loop. > > > > In this scenario, ofctrl_run() will move state to S_CLEAR_FLOWS > > and return, without having executed run_S_CLEAR_FLOWS() which > > would have immediately transitioned the state to S_UPDATE_FLOWS > > which is needed in order for ovn-controller to configure flows > > in ovs-vswitchd. After a delay of about 5 seconds (the default > > SB probe timer interval), ofctrl_run() would be called again > > to make the transition to S_UPDATE_FLOWS, but by this time > > the test case has already failed. > > > > Fix by expanding the state machine's "while state != old_state" > > loop to include processing of receive events. Without this > > fix, around 40 failures are seen out of 100 attempts, with > > this fix no failures have been observed in several hundred > > attempts. > > > > Signed-off-by: Lance Richardson <lrich...@redhat.com> > > --- > > I was going to simple ack this as being useful for the unit tests, > but then I got to wondering if it made a difference in the real > world, so I set up the following: > > 4 node OpenStack cloud running tip of tree master. > Run rally's create-and-list-ports test four times: > 1a+b) with 15 repetitions (150 ports) and 3 tenants, > w/ and w/o this patch > 2a+b) with 15 repetitions (150 ports) and 15 tenants, > w/ and w/o this patch > > Dumping data and running t-tests on the create port times gives me: > > - a very statistically significant 22% difference between 1a and 1b > (2-tailed P of 0.0057) > - an extremely statistically significant 35% difference between 2a > and 2b (2-tailed P of 0.0001) > > So... > > Acked-By: Ryan Moats <rmo...@us.ibm.com> > Tested-By: Ryan Moats <rmo...@us.ibm.com> > > Can we get this merged quickly, pretty please??? :) > >
Nice, thanks for reviewing and testing! _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev