Hello, I've been debugging similar problem with buffers "re-entering" bond-input with sw_if_index of the bond interface. Turns out there are a few issues related to buffer use-after-free and also dpdk mempool cache breakage. The problems manifest themselves during severe buffer starvation. In ip4_full_reass_finalize(), there's use-after-free which goes "unpunished" most of the time, but when VPP is out of buffers, the buffer may be quickly snatched and reused by another worker, causing data corruption. Fix: https://gerrit.fd.io/r/c/vpp/+/44808
Another issue is mishandling of DPDK mempool cache. The mempool code is done in such way that the deque operation is not supposed to modify obj_table that's passed to it in case if it fails to allocate the necessary number of elements. In dpdk_ops_vpp_dequeue() in case of buffer allocation failure the already allocated buffers are freed, but the obj_table still gets modified and references to the freed buffers are put there, leading, again, to data corruption. Fix: https://gerrit.fd.io/r/c/vpp/+/44809 There's some other problem that manifests itself less often and I'm still debugging it, namely this (visible in a debug build): /data/tnsr-pkgs/work/vpp/src/vlib/buffer.h:294 (vlib_buffer_advance) assertion `b->current_length >= l' fails Stack: ... #9 0x000062300bfa4253 in os_panic () at /data/tnsr-pkgs/work/vpp/src/vpp/vnet/main.c:454 #10 0x00007eefb8ecdb99 in ?? () from /lib/x86_64-linux-gnu/libvppinfra.so.25.06 #11 0x00007eefb8ecd950 in _clib_error () from /lib/x86_64-linux-gnu/libvppinfra.so.25.06 #12 0x00007eefb9a61480 in vlib_buffer_advance (b=0x1001fe8ac0, l=14) at /data/tnsr-pkgs/work/vpp/src/vlib/buffer.h:304 #13 0x00007eefb9a5a682 in ethernet_input_inline (vm=0x7eef7f508240, node=0x7eef7fb0c980, from=0x7eef7f5d9ff0, n_packets=19, variant=ETHERNET_INPUT_VARIANT_ETHERNET) at /data/tnsr-pkgs/work/vpp/src/vnet/ethernet/node.c:1411 #14 0x00007eefb9a5992d in ethernet_input_node_fn_skx (vm=0x7eef7f508240, node=0x7eef7fb0c980, frame=0x7eef7f5d9fc0) at /data/tnsr-pkgs/work/vpp/src/vnet/ethernet/node.c:1779 #15 0x00007eefb8ff98a7 in dispatch_node (vm=0x7eef7f508240, node=0x7eef7fb0c980, type=VLIB_NODE_TYPE_INTERNAL, frame=0x7eef7f5d9fc0, dispatch_reason=VLIB_NODE_DISPATCH_REASON_PENDING_FRAME, last_time_stamp=25267290801901442) at /data/tnsr-pkgs/work/vpp/src/vlib/main.c:1042 #16 0x00007eefb8ffa556 in dispatch_pending_node (vm=0x7eef7f508240, pending_frame_index=1, last_time_stamp=25267290801901442) at /data/tnsr-pkgs/work/vpp/src/vlib/main.c:1200 #17 0x00007eefb8ff516b in vlib_main_or_worker_loop (vm=0x7eef7f508240, is_main=0) at /data/tnsr-pkgs/work/vpp/src/vlib/main.c:1739 #18 0x00007eefb8ffc556 in vlib_worker_thread_fn (arg=0x7eef7a4e22c0) at /data/tnsr-pkgs/work/vpp/src/vlib/main.c:2189 #19 0x00007eefb902d966 in vlib_worker_thread_bootstrap_fn (arg=0x7eef7a4e22c0) at /data/tnsr-pkgs/work/vpp/src/vlib/threads.c:490 ... I'm currently debugging this one, it likely originates elsewhere. Overall, it seems it would be nice to have more testing of VPP under the buffer starvation condition. Hope this helps. Ivan
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#26754): https://lists.fd.io/g/vpp-dev/message/26754 Mute This Topic: https://lists.fd.io/mt/115667377/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
