Hi  Adrián,

     Are there any updates on this issue? We've conducted some analysis but are 
more eagerly awaiting the community's authoritative explanation。

    thanks very much !




 
发件人: Adrián Moreno
发送时间: 2025-05-06 16:56
收件人: chenyongch...@cestc.cn
抄送: ovs-discuss
主题: Re: [ovs-discuss] Adjusting the bond-rebalance-interval configuration 
triggers the ‌USERSPACE_INVALID_PORT_DROP‌ error during PMD packet transmission
On Tue, Apr 29, 2025 at 01:48:42PM +0800, chenyongchang--- via discuss wrote:
>
> Hello,
> In a high-traffic scenario, when modifying the bond-rebalance-interval 
> configuration for an OVS-DPDK bond interface,
> we observed that OVS-DPDK generated USERSPACE_INVALID_PORT_DROP errors.
>
> After analysis, executing the command ovs-vsctl set port dpdk_tun_port 
> other_config:bond-rebalance-interval=1000
> triggered the following process, ultimately leading to the 
> USERSPACE_INVALID_PORT_DROP errors:
>
> 1. Execution of memset(bond->hash, 0, hash_len);
> Call stack:
> #0 bond_entry_reset (bond=0x4c64bc0) at ofproto/bond.c:1852
> #1 0x0000000001a2a238 in bond_reconfigure (bond=0x4c64bc0, s=0x7fff6d1dec10) 
> at ofproto/bond.c:514
> #2 0x0000000001a4e253 in bundle_set (ofproto_=0x4c21110, aux=0x4c39d90, 
> s=0x7fff6d1deb90) at ofproto/ofproto-dpif.c:3484
> #3 0x0000000001a31b27 in ofproto_bundle_register (ofproto=0x4c21110, 
> aux=0x4c39d90, s=0x7fff6d1deb90) at ofproto/ofproto.c:1430
> #4 0x0000000001a1c80e in port_configure (port=0x4c39d90) at 
> vswitchd/bridge.c:1384
> #5 0x0000000001a1b7b3 in bridge_reconfigure (ovs_cfg=0x4bb37c0) at 
> vswitchd/bridge.c:1005
> #6 0x0000000001a223e7 in bridge_run () at vswitchd/bridge.c:3423
> #7 0x0000000001a27b9e in main (argc=11, argv=0x7fff6d1def38) at 
> vswitchd/ovs-vswitchd.c:129
>
> 2. Execution of member_map[i] = OFPP_NONE
> Call stack:
> #0 bond_add_lb_output_buckets (bond=0x37220f0) at ofproto/bond.c:2135
> #1 0x0000000001a29b4f in update_recirc_rules__ (bond=0x37220f0) at 
> ofproto/bond.c:356
> #2 0x0000000001a29ebe in update_recirc_rules (bond=0x37220f0) at 
> ofproto/bond.c:426
> #3 0x0000000001a2a262 in bond_reconfigure (bond=0x37220f0, s=0x7fffffffe230) 
> at ofproto/bond.c:520
> #4 0x0000000001a4e292 in bundle_set (ofproto_=0x366afa0, aux=0x3713290, 
> s=0x7fffffffe1b0) at ofproto/ofproto-dpif.c:3484
> #5 0x0000000001a31b66 in ofproto_bundle_register (ofproto=0x366afa0, 
> aux=0x3713290, s=0x7fffffffe1b0) at ofproto/ofproto.c:1430
> #6 0x0000000001a1c80e in port_configure (port=0x3713290) at 
> vswitchd/bridge.c:1384
> #7 0x0000000001a1b7b3 in bridge_reconfigure (ovs_cfg=0x3660180) at 
> vswitchd/bridge.c:1005
> #8 0x0000000001a223b7 in bridge_run () at vswitchd/bridge.c:3422
> #9 0x0000000001a27b92 in main (argc=1, argv=0x7fffffffe558)
>
> 3.PMD thread sending packets found port_no=0xffffffff
> Call stack:
> #0  dp_execute_output_action (pmd=0x7fff68731010, packets_=0x7fff53ff8f50, 
> should_steal=true, port_no=4294967295)
>     at lib/dpif-netdev.c:9273
> #1  0x0000000001acaf6d in dp_execute_lb_output_action (pmd=0x7fff68731010, 
> packets_=0x7fff53ff9ca0, should_steal=true,
>     bond=1) at lib/dpif-netdev.c:9350
> #2  0x0000000001acb0b6 in dp_execute_cb (aux_=0x7fff53ff9b30, 
> packets_=0x7fff53ff9ca0, a=0x7fff4800f074, should_steal=true)
>     at lib/dpif-netdev.c:9379
> #3  0x0000000001b526b5 in odp_execute_actions (dp=0x7fff53ff9b30, 
> batch=0x7fff53ff9ca0, steal=true,
>     actions=0x7fff4800f074, actions_len=8, dp_execute_action=0x1acafc0 
> <dp_execute_cb>) at lib/odp-execute.c:1016
> #4  0x0000000001acbc8e in dp_netdev_execute_actions (pmd=0x7fff68731010, 
> packets=0x7fff53ff9ca0, should_steal=true,
>     flow=0x7fff4800ea70, actions=0x7fff4800f074, actions_len=8) at 
> lib/dpif-netdev.c:9698
> #5  0x0000000001ac8133 in packet_batch_per_flow_execute 
> (batch=0x7fff53ff9c90, pmd=0x7fff68731010)
>     at lib/dpif-netdev.c:8338
> #6  0x0000000001aca3ad in dp_netdev_input__ (pmd=0x7fff68731010, 
> packets=0x7fff53ffbdf0, md_is_valid=false, port_no=4)
>     at lib/dpif-netdev.c:9055
> #7  0x0000000001aca3ff in dp_netdev_input (pmd=0x7fff68731010, 
> packets=0x7fff53ffbdf0, port_no=4) at lib/dpif-netdev.c:9064
> #8  0x0000000001ac0da2 in dp_netdev_process_rxq_port (pmd=0x7fff68731010, 
> rxq=0x3720220, port_no=4)
>     at lib/dpif-netdev.c:5690
> #9  0x0000000001ac566a in pmd_thread_main (f_=0x7fff68731010) at 
> lib/dpif-netdev.c:7334
> #10 0x0000000001bc4b1b in ovsthread_wrapper (aux_=0x3711920) at 
> lib/ovs-thread.c:422
> #11 0x00007ffff76f4802 in start_thread () from /lib64/libc.so.6
> --Type <RET> for more, q to quit, c to continue without paging--
> #12 0x00007ffff7694314 in clone () from /lib64/libc.so.6
>
> The main issue arises from a timing discrepancy between the main thread and 
> the PMD thread when operating on pmd->tx_bonds,
> which causes the PMD to temporarily resolve the egress interface to 
> 0xffffffff (an invalid value).
> What solutions does the community propose to address this problem?
 
Reconfiguring the bonding flows for a simple change in the
rebalance_interval seems an overkill. It was added so that users could
disable rebalancing but just increasing or decreasing the interval
(without initial or final values being zero) should not trigger a bond
reset.
 
Also, if bond_reconfigure resets the bond hashes, we should probaly not
wait until bond_run() calls bond_update_post_recirc_rules__() to initialize
them. Even for recirc-driven bonds, this makes an initial update of
post-recirc rules with all hashes being zero.
 
I'll take a deeper look probably next week.
 
>
> our ovs version 2.17.5 lts.
 
Note 2.17.5 is not EOL.
 
Thanks.
Adrián
 
 
 
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to