On 10/6/23 20:10, Алексей Кашавкин via discuss wrote: > Hello! > > I am using OVS with DPDK in OpenStack. This is RDO+TripleO deployment with > the Train release. I am trying to measure the performance of the DPDK compute > node. I have created two VMs [1], one as a DUT with DPDK and one as a traffic > generator with SR-IOV [2]. Both of them are using Pktgen. > > What happens is the following: for the first 3-4 minutes I see 2.6Gbit [3] > reception in DUT, after that the speed always drops to 400Mbit [4]. At the > same time in the output of `pmd-rxq-show` command I always see one of the > interfaces in the bond loaded [5], but it happens that after flapping of the > active interface the speed in DUT increases up to 5Gbit and in the output of > `pmd-rxq-show` command I start to see the load on two interfaces [6]. But at > the same time after 3-4 minutes the speed drops to 700Mbit and I continue to > see the same load on the two interfaces in the bond in the `pmd-rxq-show` > command. In the logs I see nothing but flapping [7] of the interfaces in bond > and the flapping has no effect on the speed drop after 3-4 minutes of test. > After the speed drop from the DUT itself I run traffic towards the traffic > generator [8] for a while and stop, then the speed on the DUT is restored to > 2.6Gbit again with traffic going through one interface or 5Gbit with traffic > going through two interfaces, but this again is only for 3-4 minutes. If I do > a test with a traffic generator with a 2.5 Gbit or 1 Gbit speed limit, the > speed also drops to DUT after 4-5 minutes. I've put logging in debug for > bond, dpdk, netdev_dpdk, dpif_netdev, but haven't seen anything that > clarifies what's going on, and also it's not clear that sometimes after > flapping the active interface traffic starts going through both interfaces in > bond, but this happens rarely, not in every test.
Since rate is restored after you sending some traffic in the backward direction, I'd say you have MAC learning somewhere on the path and it is getting expired. For example, if you use NORMAL action in one of the bridges, once the MAC is expired, the bridge will start flooding packets to all ports of the bridge, which is very slow. You may look at datapath flow dump to confirm which actions are getting executed on your packets: ovs-appctl dpctl/dump-flows. In general, you should always continuously send some traffic back for learned MAC addresses to not expire. I'm not sure if Pktgen is doing that these days, but it wasn't a very robust piece of software in the past. > > [4] The flapping of the interface through which traffic is going to the DUT > VM is probably due to the fact that it is heavily loaded alone in the bond > and there are no LACP PDU packets going to or from it. The log shows that it > is down for 30 seconds because the LACP rate is set to slow mode. Dropped LACP packets can cause bond flapping indeed. The only way to fix that in older versions of OVS is to reduce the load. With OVS 3.2 you may try experimental 'rx-steering' configuration that was designed exactly for this scenario and should ensure that PDU packets are not dropped. Also, balancing depends on packet hashes, so you need to send many different traffic flows in order to get consistent balancing. > > I have done DUT on different OS, with different versions of DPDK and Pktgen. > But always the same thing happens, after 3-4 minutes the speed drops. > Only on the DPDK compute node I didn't change anything. The compute node has > Intel E810 network card with 25Gbit ports and Intel Xeon Gold 6230R CPU. The > PMD threads uses cores 11, 21, 63, 73 on numa 0 and 36, 44, 88, 96 on numa 1. All in all, 2.6Gbps seems like a small number for the type of a system you have. You might have some other configuration issues. > > In addition: > [9] ovs-vsctl show > [10] OVSDB dump > [11] pmd-stats-show > [12] bond info with ovs-appctl > > For compute nodes, I use Rocky Linux 8.5, Open vSwitch 2.15.5, and DPDK > 20.11.1. FWIW, OVS 2.15 reached EOL ~1.5 years ago. Best regards, Ilya Maximets. > > > What could be the cause of this behavior? I don't understand where I should > look to find out exactly what is going on. > > > 1. https://that.guru/blog/pktgen-between-two-openstack-guests > <https://that.guru/blog/pktgen-between-two-openstack-guests> > 2. https://freeimage.host/i/J206p8Q <https://freeimage.host/i/J206p8Q> > 3. https://freeimage.host/i/J20Po9p <https://freeimage.host/i/J20Po9p> > 4. https://freeimage.host/i/J20PRPs <https://freeimage.host/i/J20PRPs> > 5. https://pastebin.com/rpaggexZ <https://pastebin.com/rpaggexZ> > 6. https://pastebin.com/Zhm779vT <https://pastebin.com/Zhm779vT> > 7. https://pastebin.com/Vt5P35gc <https://pastebin.com/Vt5P35gc> > 8. https://freeimage.host/i/J204SkB <https://freeimage.host/i/J204SkB> > 9. https://pastebin.com/rNJZeyPy <https://pastebin.com/rNJZeyPy> > 10. https://pastebin.com/wEifvivH <https://pastebin.com/wEifvivH> > 11. https://pastebin.com/pELywZUQ <https://pastebin.com/pELywZUQ> > 12. https://pastebin.com/PTV6fWEb <https://pastebin.com/PTV6fWEb> _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss