On Thu, Apr 14, 2016 at 08:37:27AM -0500, Ryan Moats wrote: > I've pretty much become fed up with the "raceful" nature of the E2E > ovn test cases and so I've set my self the goal of fixing them. > > After some thought last night, I *think* I might have found a way > to do it. Now, since I'm not 100% that my idea is the cleanest way > to fix things, I thought I'd throw the idea out on the table first > and get some feedback before I go off and write code. > > I'm thinking of the following changes: > > in ovn/controller/ofctrl.c, change ofctrl_put to return a boolean > (true if a flow is queue, false otherwise). > > in ovn/controller/ovn-controller.c: > 1. add a command line argument (--unit-test). > 2. if ofctrl_put return true and the above command line argument > is specified, dump a line to the log file saying that ovn-controller > hasn't made any OF changes in the last loop > > in the ovn E2E tests cases: > 1. start ovn-controller with the --unit-test argument > 2. instead of sleep 1 for waiting for ovn-northd/ovn-controller to > quiesce, look at the tail of the ovn-controller logs for two > consecutive loops where ovn-controller hasn't made any OF changes > in the last loop
I've had a different idea for how to do this for a while now. Let me see... You might not be familiar with how ovs-vsctl waits until the changes that it makes to the Open vSwitch configuration take effect before exiting. It uses a sequence number protocol in the Open_vSwitch database table. This table (which has exactly one row) has two integer columns, named cur_cfg and next_cfg. Initially cur_cfg and next_cfg are both zero. When ovs-vsctl modifies the configuration, it also atomically increments next_cfg. Before ovs-vswitchd reconfigures itself, it reads next_cfg, then when it finishes it sets cur_cfg to the next_cfg value that it read. ovs-vsctl waits until cur_cfg is greater than or equal to the next_cfg that it set before it exits. This has a number of nice properties. It has low overhead in time and space, it works efficiently with any number of writers, and later on it is easy to observe what happened and when by looking at the database log. My thought was to extend this concept to the OVN distributed system. For example: * Add a next_cfg value somewhere in the OVN_Northbound database. When ovn-nbctl or something else updates the database and wants to allow for finding out where the updates are propagated, it increments next_cfg. * Add a similar next_cfg somewhere in OVN_Southbound, and make ovn-northd propagate the value from northbound to southbound. * Add a cur_cfg column to the Chassis table in OVN_Southbound. When an ovn-controller brings its chassis up to date with a given configuration, it stores the next_cfg value that it just got up-to-date with into its own cur_cfg. * In theory ovn-northd could propagate the current minimum value of cur_cfg back to the northbound database, if that's useful. In a real system with constant failures though it's hard for me to guess whether it's realistic. This could even be used such that ovn-nbctl waits for the whole distributed system to catch up before it exits. What are your thoughts? _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev