I had done an internal patch on OVS 2.0 code, does not seem like years ago, but the default timeout was 5 seconds for flow counts less than flow_eviction_threshold. The histogram as written had some algorithm issues, so had the potential to thrash the system for excessive flow removal counts per loop that exceeded flow_eviction_threshold.
It might have been this discard of 99% (total / 100) of the flows (in reality all flows over flow_eviction_threshold) each revalidator loop could have been causing the thrashing problem that was observed with the earlier OVS versions. Maybe making a more efficient revalidator might be another way to help this issue. Mike diff --git a/openvswitch/ofproto/ofproto-dpif.c b/openvswitch/ofproto/ofproto-dpif.c index 92f3262..3dd297f 100644 --- a/openvswitch/ofproto/ofproto-dpif.c +++ b/openvswitch/ofproto/ofproto-dpif.c @@ -76,6 +76,19 @@ COVERAGE_DEFINE(subfacet_install_fail); COVERAGE_DEFINE(packet_in_overflow); COVERAGE_DEFINE(flow_mod_overflow); +/* Flow IDLE Timeout definitions */ + +/* Millseconds to timeout flow, original 5000 */ +#define IDLE_FLOW_TIMEOUT 30000 +/* Millseconds to timeout flow minimum, original 100 */ +#define IDLE_FLOW_TIMEOUT_MIN 5000 +/* Millseconds to timeout special flow, original 10000 */ +#define IDLE_FLOW_TIMEOUT_SPECIAL 40000 +/* Idle histogram bucket width (to keep same number of buckets), original 100 */ +#define IDLE_HIST_TIME_WIDTH 500 +/* Idle upper amount to keep each discard cycle, original 0.01 */ +#define IDLE_EVICTION_KEEP_RATE 0.9 + /* Number of implemented OpenFlow tables. */ enum { N_TABLES = 255 }; enum { TBL_INTERNAL = N_TABLES - 1 }; /* Used for internal hidden rules. */ @@ -3786,17 +3799,17 @@ subfacet_max_idle(const struct dpif_backer *backer) * pass made by update_stats(), because the former function never looks at * uninstallable subfacets. */ - enum { BUCKET_WIDTH = ROUND_UP(100, TIME_UPDATE_INTERVAL) }; - enum { N_BUCKETS = 5000 / BUCKET_WIDTH }; + enum { BUCKET_WIDTH = ROUND_UP(IDLE_HIST_TIME_WIDTH, TIME_UPDATE_INTERVAL) }; + enum { N_BUCKETS = IDLE_FLOW_TIMEOUT / BUCKET_WIDTH }; int buckets[N_BUCKETS] = { 0 }; - int total, subtotal, bucket; + int total, subtotal, bucket, keep, idle_timeout; struct subfacet *subfacet; long long int now; int i; total = hmap_count(&backer->subfacets); if (total <= flow_eviction_threshold) { - return N_BUCKETS * BUCKET_WIDTH; + return IDLE_FLOW_TIMEOUT; } /* Build histogram. */ @@ -3810,11 +3823,13 @@ subfacet_max_idle(const struct dpif_backer *backer) } /* Find the first bucket whose flows should be expired. */ - subtotal = bucket = 0; + keep = MAX(flow_eviction_threshold, (int)(total * IDLE_EVICTION_KEEP_RATE)); + subtotal = bucket = idle_timeout = 0; do { subtotal += buckets[bucket++]; + idle_timeout += BUCKET_WIDTH; } while (bucket < N_BUCKETS && - subtotal < MAX(flow_eviction_threshold, total / 100)); + (subtotal < keep || idle_timeout < IDLE_FLOW_TIMEOUT_MIN)); if (VLOG_IS_DBG_ENABLED()) { struct ds s; @@ -3833,7 +3848,7 @@ subfacet_max_idle(const struct dpif_backer *backer) ds_destroy(&s); } - return bucket * BUCKET_WIDTH; + return idle_timeout; } static void @@ -3844,7 +3859,7 @@ expire_subfacets(struct dpif_backer *backer, int dp_max_idle) /* We really want to keep flows for special protocols around, so use a more * conservative cutoff. */ - long long int special_cutoff = time_msec() - 10000; + long long int special_cutoff = time_msec() - IDLE_FLOW_TIMEOUT_SPECIAL; struct subfacet *subfacet, *next_subfacet; struct subfacet *batch[SUBFACET_DESTROY_MAX_BATCH]; -----Original Message----- From: Ethan Jackson [mailto:et...@nicira.com] Sent: Friday, June 20, 2014 11:11 AM To: Polehn, Mike A Cc: Alex Wang; dev@openvswitch.org Subject: Re: [ovs-dev] [Backport] Backport max-idle to branch-1.10 - branch-2.1. > I think the setting of 1.5 seconds is due to inexperience and needs to be > drastically changed. If flow timeout is specified on the OpenFlow command, > the exact match flow timeout should use the OpenFlow set timeout and not an > arbitrary value since there was probably a reason for setting the particular > value. The 1.5 second number does not come from inexperience, in fact exactly the opposite. Over years of running Open vSwitch in multiple production deployments, we've found that a key factor in maintaining reasonable performance is management of the datapath flow cache. If the idle timeout is too large, then the datapath fills up with unused flows which stress the revalidators and take up space that newer more useful flows could occupy. I can see that when doing performance testing a larger number may be more convenient, which is why we allow it to be configured. That said, we don't intend to change the default. Ethan age----- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Alex Wang > Sent: Thursday, June 19, 2014 9:20 PM > To: dev@openvswitch.org > Subject: [ovs-dev] [Backport] Backport max-idle to branch-1.10 - branch-2.1. > > This series backports the commit 72310b04 (upcall: Configure datapath > max-idle through ovs-vsctl.) to branch-1.10 - branch-2.1, for testing purpose. > > -- > 1.7.9.5 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev