On 13 Mar 2019, at 10:19, Lollita Liu <lollita....@ericsson.com<mailto:lollita....@ericsson.com>> wrote:
Hi, Damjan. We found in vlib_buffer_free_inline with CLIB_HAVE_VEC128, buffers will always be freed one by one, not via the 4 packets batch fast path. The phenomenon is the same by calling via avf_device_class_tx_fn or via error_drop. It will add about 20 CPU cycles in my test case, the pain point is clib_atomic_sub_fetch. Perf result attached for your reference. 27.54% avf_plugin.so [.] avf_device_class_tx_fn_avx2 20.89% libvnet.so.19.04 [.] ip4_rewrite_node_fn_avx2 13.12% libvnet.so.19.04 [.] ip4_lookup_node_fn_avx2 12.47% dpdk_plugin.so [.] ipsec_tx_input_node_fn_avx2 8.14% libvlib.so.19.04 [.] vlib_worker_loop 4.65% libvlib.so.19.04 [.] dispatch_pending_node 4.64% libvnet.so.19.04 [.] vnet_interface_output_node 3.15% dpdk_plugin.so [.] ipsec_encrypt_tx_node_fn_avx2 1.60% dpdk_plugin.so [.] dpdk_crypto_input_node_fn_avx2 1.31% libvlib.so.19.04 [.] vlib_put_next_frame 1.02% libvlib.so.19.04 [.] vlib_get_next_frame_internal 0.44% libvppinfra.so.19.04 [.] mspace_usable_size_with_delta 0.44% libvppinfra.so.19.04 [.] mspace_usable_size 0.29% libc-2.23.so [.] __memmove_avx_unaligned 0.15% libvnet.so.19.04 [.] vlib_put_next_frame@plt 0.15% libvnet.so.19.04 [.] vnet_get_main 0.00% [kernel] [k] native_write_msr │ if (clib_atomic_sub_fetch (&b[0]->ref_count, 1) == 0) 0.29 │ lock subb $0x1,0xc(%rax) 20.48 │ ┌──jne 1955 │ │ mov %ecx,%eax │ │ add $0x1,%ecx │ │ { │ │ vlib_buffer_copy_template (b[0], &bt); 0.87 │192b:│ mov -0x4b0(%rbp),%rdx After checking the source code, I think the implement is different with or without CLIB_HAVE_VEC128. Without CLIB_HAVE_VEC128, as long as the four packets are not in buffer chain, with ref_count as 1, and allocated by same buffer pool, they will be freed in batch. With CLIB_HAVE_VEC128, even in those condition, the sum will be not 0, and will goto one_by_one. My suggestion is set ref_count in flags_refs_mask to 1, then if the ref_count is 0 or 1, it will go to fast path. At this point ref_count cannot be 0. This is not totally same as without CLIB_HAVE_VEC128, but it looks is the most simplified fix. vlib_buffer_t flags_refs_mask = { .flags = VLIB_BUFFER_NEXT_PRESENT, .ref_count = ~1 }; Makes sense to me, and it should be the same as ref_count==0 at this point is not possible. Can you please submit the patch. I know you already did it as part of bigger one which i half-merged. Going forward please make one fix per patch. Thanks -- Damjan
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12504): https://lists.fd.io/g/vpp-dev/message/12504 Mute This Topic: https://lists.fd.io/mt/30414067/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-