Hi Mário, 2024-06-19 08:45 (UTC+0200), Mário Kuka: > Hello, > > I want to use hairpin queues to forward high priority traffic (such as > LACP). > My goal is to ensure that this traffic is not dropped in case the > software pipeline is overwhelmed. > But during testing with dpdk-testpmd I can't achieve full throughput for > hairpin queues.
For maintainers: I'd like to express interest in this use case too. > > The best result I have been able to achieve for 64B packets is 83 Gbps > in this configuration: > $ sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:17:00.0,hp_buf_log_sz=19 -- > --rxq=1 --txq=1 --rxd=4096 --txd=4096 --hairpinq=2 > testpmd> flow create 0 ingress pattern eth src is 00:10:94:00:00:03 / > end actions rss queues 1 2 end / end Try enabling "Explicit Tx rule" mode if possible. I was able to achieve 137 Mpps @ 64B with the following command: dpdk-testpmd -a 21:00.0 -a c1:00.0 --in-memory -- \ -i --rxq=1 --txq=1 --hairpinq=8 --hairpin-mode=0x10 You might get even better speed, because my flow rules were more complicated (RTE Flow based "router on-a-stick"): flow create 0 ingress group 1 pattern eth / vlan vid is 721 / end actions of_set_vlan_vid vlan_vid 722 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 1 ingress group 1 pattern eth / vlan vid is 721 / end actions of_set_vlan_vid vlan_vid 722 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 0 ingress group 1 pattern eth / vlan vid is 722 / end actions of_set_vlan_vid vlan_vid 721 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 1 ingress group 1 pattern eth / vlan vid is 722 / end actions of_set_vlan_vid vlan_vid 721 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create 1 ingress group 0 pattern end actions jump group 1 / end > > For packets in the range 68-80B I measured even lower throughput. > Full throughput I measured only from packets larger than 112B > > For only one queue, I didn't get more than 55Gbps: > $ sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:17:00.0,hp_buf_log_sz=19 -- > --rxq=1 --txq=1 --rxd=4096 --txd=4096 --hairpinq=1 -i > testpmd> flow create 0 ingress pattern eth src is 00:10:94:00:00:03 / > end actions queue index 1 / end > > I tried to use locked device memory for TX and RX queues, but it seems > that this is not supported: > "--hairpin-mode=0x011000" (bit 16 - hairpin TX queues will use locked > device memory, bit 12 - hairpin RX queues will use locked device memory) RxQ pinned in device memory requires firmware configuration [1]: mlxconfig -y -d $pci_addr set MEMIC_SIZE_LIMIT=0 HAIRPIN_DATA_BUFFER_LOCK=1 mlxfwreset -y -d $pci_addr reset [1]: https://doc.dpdk.org/guides/platform/mlx5.html?highlight=hairpin_data_buffer_lock However, pinned RxQ didn't improve anything for me. TxQ pinned in device memory is not supported by net/mlx5. TxQ pinned to DPDK memory made performance awful (predictably). > I was expecting that achieving full throughput with hairpin queues would > not be a problem. > Is my expectation too optimistic? > > What other parameters besides 'hp_buf_log_sz' can I use to achieve full > throughput? In my experiments, default "hp_buf_log_sz" of 16 is optimal. The most influential parameter appears to be the number of hairpin queues. > I tried combining the following parameters: mprq_en=, rxqs_min_mprq=, > mprq_log_stride_num=, txq_inline_mpw=, rxq_pkt_pad_en=, > but with no positive impact on throughput.