Hello!

I am using OVS with DPDK in OpenStack. This is RDO+TripleO deployment with the 
Train release. I am trying to measure the performance of the DPDK compute node. 
I have created two VMs [1], one as a DUT with DPDK and one as a traffic 
generator with SR-IOV [2]. Both of them are using Pktgen. 

What happens is the following: for the first 3-4 minutes I see 2.6Gbit [3] 
reception in DUT, after that the speed always drops to 400Mbit [4]. At the same 
time in the output of `pmd-rxq-show` command I always see one of the interfaces 
in the bond loaded [5], but it happens that after flapping of the active 
interface the speed in DUT increases up to 5Gbit and in the output of 
`pmd-rxq-show` command I start to see the load on two interfaces [6]. But at 
the same time after 3-4 minutes the speed drops to 700Mbit and I continue to 
see the same load on the two interfaces in the bond in the `pmd-rxq-show` 
command. In the logs I see nothing but flapping [7] of the interfaces in bond 
and the flapping has no effect on the speed drop after 3-4 minutes of test. 
After the speed drop from the DUT itself I run traffic towards the traffic 
generator [8] for a while and stop, then the speed on the DUT is restored to 
2.6Gbit again with traffic going through one interface or 5Gbit with traffic 
going through two interfaces, but this again is only for 3-4 minutes. If I do a 
test with a traffic generator with a 2.5 Gbit or 1 Gbit speed limit, the speed 
also drops to DUT after 4-5 minutes. I've put logging in debug for bond, dpdk, 
netdev_dpdk, dpif_netdev, but haven't seen anything that clarifies what's going 
on, and also it's not clear that sometimes after flapping the active interface 
traffic starts going through both interfaces in bond, but this happens rarely, 
not in every test.

[4] The flapping of the interface through which traffic is going to the DUT VM 
is probably due to the fact that it is heavily loaded alone in the bond and 
there are no LACP PDU packets going to or from it. The log shows that it is 
down for 30 seconds because the LACP rate is set to slow mode.

I have done DUT on different OS, with different versions of DPDK and Pktgen. 
But always the same thing happens, after 3-4 minutes the speed drops.
Only on the DPDK compute node I didn't change anything. The compute node has 
Intel E810 network card with 25Gbit ports and Intel Xeon Gold 6230R CPU. The 
PMD threads uses cores 11, 21, 63, 73 on numa 0 and 36, 44, 88, 96 on numa 1.

In addition:
[9] ovs-vsctl show
[10] OVSDB dump
[11] pmd-stats-show
[12] bond info with ovs-appctl

For compute nodes, I use Rocky Linux 8.5, Open vSwitch 2.15.5, and DPDK 20.11.1.


What could be the cause of this behavior? I don't understand where I should 
look to find out exactly what is going on.


1. https://that.guru/blog/pktgen-between-two-openstack-guests
2. https://freeimage.host/i/J206p8Q
3. https://freeimage.host/i/J20Po9p
4. https://freeimage.host/i/J20PRPs
5. https://pastebin.com/rpaggexZ
6. https://pastebin.com/Zhm779vT
7. https://pastebin.com/Vt5P35gc
8. https://freeimage.host/i/J204SkB
9. https://pastebin.com/rNJZeyPy
10. https://pastebin.com/wEifvivH
11. https://pastebin.com/pELywZUQ
12. https://pastebin.com/PTV6fWEb



_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to