Hi,

We did further tests and found that it is indeed the conntrack global lock that 
was introduced with below commit that is causing the performance degradation.

We did Perf tool analysis with and without below commit and we could see huge 
increase in pthread_mutex_lock samples.  In our testbed we had 4 PMD threads 
handling traffic from two dpdk and various VHU ports.

At a data structure level , we could see a major change w.r.t to how the 
connections were being stored in conntrack structure.
Before :
conntrack_bucket {
              struct ct_lock lock;
              struct hmap connections OVS_GUARDED;
              struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
              struct ovs_mutex cleanup_mutex;
              long long next_cleanup OVS_GUARDED;
}
After :
struct conntrack {
-    /* Independent buckets containing the connections */
-    struct conntrack_bucket buckets[CONNTRACK_BUCKETS];
..
+    struct ovs_mutex ct_lock; /* Protects 2 following fields. */
+    struct cmap conns OVS_GUARDED;
+    struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;

}

Earlier 'conntrack_bucket' structure  was holding list of connections for given 
hash bucket . This was removed and all connections added to main 'conntrack' 
structure and that list traversal now is protected by conntrack global 
'ct_lock'.

We see the global 'ct->ct_lock' taken to do 'conn_update_expiration' (which 
happens for every packet) is adding too much of the performance drop

Earlier with the conn_key_hash the connections created are mapped to matching 
hash bucket. Any update of state (mostly expiration time) involves moving the 
connection back into the list of connections belonging to that hash bucket. 
This was done with bucket level lock and with 256 buckets we have less 
contention.

Now this 'ct->ct_lock' adds more contention and is causing the performance 
degradation.

We also did the test-conntrack benchmarking

1. The standard 1 thread test :

After commit
$ ./ovstest test-conntrack benchmark 1 14880000 32
conntrack:   2230 ms

Before commit
$ ./ovstest test-conntrack benchmark 1 14880000 32
conntrack:   1673 ms

2. We also did multiple thread test (4 threads)

$ ./ovstest test-conntrack benchmark 4 33554432 32 1    (32 Million packets)
Before : conntrack:  15043 ms / conntrack:  14644 ms
After  : conntrack:  71373 ms / conntrack:  65816 ms

So with increase in number of connections and multiple threads doing 
conntrack_execute the impact is more and profound.

Are there any changes that are expected to fix this performance issue in the 
near future?

Do we have  conntrack related  performance tests that are run with every 
release ?

Thanks
Kiran

From: K Venkata Kiran
Sent: Thursday, August 6, 2020 4:20 PM
To: ovs-...@openvswitch.org; ovs-discuss@openvswitch.org; Darrell Ball 
<dlu...@gmail.com>; b...@ovn.org
Cc: Anju Thomas <anju.tho...@ericsson.com>; K Venkata Kiran 
<k.venkata.ki...@ericsson.com>
Subject: Performance drop with conntrack flows

Hi,

We see 40% traffic drop with UDP traffic over VxLAN and 20% traffic drop with 
UDP traffic over MPLSoGRE between OVS 2.8.2 & OVS 2.12.1.

We narrowed the drop in performance in our test is due to below commit and 
backing out the commit fixed the performance drop problem.

The commit of concern is :
https://github.com/openvswitch/ovs/commit/967bb5c5cd9070112138d74a2f4394c50ae48420
commit 967bb5c5cd9070112138d74a2f4394c50ae48420
Author: Darrell Ball <dlu...@gmail.com<mailto:dlu...@gmail.com>>
Date:   Thu May 9 08:15:07 2019 -0700
 conntrack: Add rcu support.
We suspect 'ct->ct_lock' lock taken to do 'conn_update_state' and for 
conn_key_lookup could be causing the issue.
Anyone noticed the issue and any pointers on fix? We could not get any obvious 
commit that could solve the issue. Any guidance in solving this issue helps?
Thanks
Kiran

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to