Hello! I am using OVS with DPDK in OpenStack. This is RDO+TripleO deployment with the Train release. I am trying to measure the performance of the DPDK compute node. I have created two VMs [1], one as a DUT with DPDK and one as a traffic generator with SR-IOV [2]. Both of them are using Pktgen.
What happens is the following: for the first 3-4 minutes I see 2.6Gbit [3] reception in DUT, after that the speed always drops to 400Mbit [4]. At the same time in the output of `pmd-rxq-show` command I always see one of the interfaces in the bond loaded [5], but it happens that after flapping of the active interface the speed in DUT increases up to 5Gbit and in the output of `pmd-rxq-show` command I start to see the load on two interfaces [6]. But at the same time after 3-4 minutes the speed drops to 700Mbit and I continue to see the same load on the two interfaces in the bond in the `pmd-rxq-show` command. In the logs I see nothing but flapping [7] of the interfaces in bond and the flapping has no effect on the speed drop after 3-4 minutes of test. After the speed drop from the DUT itself I run traffic towards the traffic generator [8] for a while and stop, then the speed on the DUT is restored to 2.6Gbit again with traffic going through one interface or 5Gbit with traffic going through two interfaces, but this again is only for 3-4 minutes. If I do a test with a traffic generator with a 2.5 Gbit or 1 Gbit speed limit, the speed also drops to DUT after 4-5 minutes. I've put logging in debug for bond, dpdk, netdev_dpdk, dpif_netdev, but haven't seen anything that clarifies what's going on, and also it's not clear that sometimes after flapping the active interface traffic starts going through both interfaces in bond, but this happens rarely, not in every test. [4] The flapping of the interface through which traffic is going to the DUT VM is probably due to the fact that it is heavily loaded alone in the bond and there are no LACP PDU packets going to or from it. The log shows that it is down for 30 seconds because the LACP rate is set to slow mode. I have done DUT on different OS, with different versions of DPDK and Pktgen. But always the same thing happens, after 3-4 minutes the speed drops. Only on the DPDK compute node I didn't change anything. The compute node has Intel E810 network card with 25Gbit ports and Intel Xeon Gold 6230R CPU. The PMD threads uses cores 11, 21, 63, 73 on numa 0 and 36, 44, 88, 96 on numa 1. In addition: [9] ovs-vsctl show [10] OVSDB dump [11] pmd-stats-show [12] bond info with ovs-appctl For compute nodes, I use Rocky Linux 8.5, Open vSwitch 2.15.5, and DPDK 20.11.1. What could be the cause of this behavior? I don't understand where I should look to find out exactly what is going on. 1. https://that.guru/blog/pktgen-between-two-openstack-guests 2. https://freeimage.host/i/J206p8Q 3. https://freeimage.host/i/J20Po9p 4. https://freeimage.host/i/J20PRPs 5. https://pastebin.com/rpaggexZ 6. https://pastebin.com/Zhm779vT 7. https://pastebin.com/Vt5P35gc 8. https://freeimage.host/i/J204SkB 9. https://pastebin.com/rNJZeyPy 10. https://pastebin.com/wEifvivH 11. https://pastebin.com/pELywZUQ 12. https://pastebin.com/PTV6fWEb
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
