Hi,

We would like to resume our earlier discussion about how to support a simple, 
generic and efficient procedure for controllers to resync all OF forwarding 
state with OVS after a reconnect while maintaining non-stop forwarding (see 
http://openvswitch.org/pipermail/dev/2016-January/064925.html and following).

To briefly recap the earlier discussion, we have two main approaches:

A) A new OF experimenter procedure to resync state in three steps:
1. Controller marks the current state in OVS as stale
2. Controller downloads/refreshes the latest state
3. Controller tells switch to cleanup all remaining stale state
The proposed procedure is described in more detail in 
https://docs.google.com/document/d/1JBwARjUKDH_r9LK_Zg92WjquAxHrOLcqze1W60rV3j4
This procedure has been implemented and used between Ericsson's controller and 
OF switches for some years. A patch for OVS 2.5 is available and could be 
rebased to master.

B) Use the OF1.4 bundle mechanism as follows:
1. Controller opens a bundle for resync
2. Clear all flows, groups and meters in the bundle
3. Download latest state within the bundle
4. Commit the bundle to atomically swap the new state into the data path
The OF 1.4 bundle was implemented in OVS 2.5 but only for flows. Support for 
the bundle extension to OF 1.3 was added on master later. Groups and meters are 
not supported yet.

While we agree in principle that the bundle mechanism (with added support for 
groups and meters) would be a possible approach to the resync problem, our 
concern is that it was actually designed for a different use case, namely 
atomic incremental updates to the OF pipeline, and that the characteristics of 
the two approaches are very different in the resync scenario when a large 
volume of OpenFlow state is involved.

To analyze and quantify the characteristics difference, we have done some 
benchmarking comparing the two approaches. Due to the limitation of the current 
bundle implementation we had to limit to the tests to flow entries. All tests 
were run on a VM with 6 cores and 3 GB RAM without traffic. The tests were run 
using scripts executed with ovs-ofctl adding flows from a file.

With the proposed hitless resync procedure we were able to resync 1 million 
flow entries without increase in memory usage. Using the bundle procedure the 
VM ran out of memory for 1M and 500K flow entries. Only for 250K flow entries 
we were able to obtain comparable measurements. At 250K flow entries the 
ovs-vswitchd process occupies 455 MB virtual memory.

Measurements for resyncing 250K flow entries:
Metric                                  Resync - OF1.3          Bundle - OF1.4
Flow update time                        ~40 sec                 ~7 sec
Flow update rate                        ~6.25K/s                ~35K/s
ovs-vswitchd CPU usage          ~140%                   ~100%
ovs-vswitchd virtual memory peak        457 Mbyte               1905 Mbyte

Refreshing the 250K flow entries using the proposed resync procedure requires 
40 seconds at ~140% CPU usage with stable memory at 457 MB. The download rate 
is ~6250 flows/s. The scan for stale flow entries at the end of the resync 
procedure takes the vswitchd process around 200 ms.

Refreshing the 250K flow entries using the bundle mechanism increases the 
vswitchd memory linearly up to 1.9 GB, significantly more than the 910 MB one 
would expect for accommodating two versions of each rule at the moment of the 
atomic activation.

Somewhat to our surprise the download and activation of the 250K bundled flow 
entries takes only 7 seconds at 100% CPU load, much faster than the non-bundled 
download. Instrumenting the code with some additional log entries showed that 
the download of the bundle takes about 5 seconds, while the activation consumes 
the remaining 2 seconds. The bundled download rate is ~50K flows/s.

It appears that installing 250K flow entries individually in ofproto_dpif 
carries a significant processing overhead compared to the atomic activation of 
the same 250K entries in a bundle. What is the reason for this? Can this be 
improved by batching these updates internally?

Conclusion:
In their current form the two approaches indeed exhibit radically different 
characteristics. The bundle mechanism is more than 5 times faster but it 
(temporarily) occupies 4 times the residual memory. Given that in many cases 
the delta between actual and desired flow state in OVS is small after a 
re-connect, we believe that speed of the cleanup may not be so crucial and that 
the ability to do it in-place without requiring a lot of extra memory resources 
(reserved huge-pages in the case of a DPDK datapath?) speaks in favor of the 
proposed resync procedure.

We would therefore like to ask the OVS community to reassess the proposed 
experimenter resync procedure in the light of the presented empiric data.

 Regards, Jan

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to