Goal of the patch series is to improve SSE macswap on x86_64 by reducing the stalls in backend engine. Original implementation of the SSE-mac-swap makes loop call to multiple load, shuffle & store.
Using SIMD ISA interleaving, register variable and reducing L1 & L2 cache eviction, we can reduce the stalls for - load SSE token exhaustion - Shuffle and Load dependency Build test using meson script: `````````````````````````````` build-gcc-static buildtools build-gcc-shared build-mini build-clang-static build-clang-shared build-x86-generic Test Results: ````````````` Platform-1: AMD EPYC SIENA 8594P @2.3GHz, no boost Platform-2: AMD EPYC 9554 @3.1GHz, no boost NIC: 1) mellanox CX-7 1*200Gbps 2) intel E810 1*100Gbps 3) intel E810 2*200Gbps (2CQ-DA2) - loopback 4) braodcom P2100 2*100Gbps - loopback ------------------------------------------------ TEST IO 64B: baseline <NIC : MPPs> - NIC-1: 42.0 - NIC-2: 82.0 - NIC-3: 82.45 - NIC-3: 47.03 ------------------------------------------------ TEST MACSWAP 64B: <NIC : Before : After> - NIC-1: 31.533 : 31.90 - NIC-2: 48.0 : 48.9 - NIC-3: 48.840 : 49.827 - NIC-4: 44.3 : 45.5 ------------------------------------------------ TEST MACSWAP 128B: <NIC : Before: After> - NIC-1: 30.946 : 31.770 - NIC-2: 47.4 : 48.3 - NIC-3: 47.979 : 48.503 - NIC-4: 41.53 : 44.59 ------------------------------------------------ TEST MACSWAP 256B: <NIC: Before: After> - NIC-1: 32.480 : 33.150 - NIC-2: 45.29 : 45.571 - NIC-3: 45.033 : 45.117 - NIC-4: 36.49 : 37.5 ------------------------------------------------ ------------------------------------------------ TEST IO 64B: baseline <NIC : MPPs> - intel E810 2*200Gbps (2CQ-DA2): 82.49 ------------------------------------------------ <NIC intel E810 2*200Gbps (2CQ-DA2): Before : After> TEST MACSWAP: 1Q 1C1T 64B: : 45.0 : 45.54 128B: : 44.48 : 44.43 256B: : 42.0 : 41.99 +++++++++++++++++++++++++ TEST MACSWAP: 2Q 2C2T 64B: : 59.5 : 60.55 128B: : 56.78 : 58.1 256B: : 41.85 : 41.99 ------------------------------------------------ Signed-off-by: Vipin Varghese <vipin.vargh...@amd.com> Vipin Varghese (3): app/testpmd: add register keyword app/testpmd: move offload update app/testpmd: interleave SSE SIMD app/test-pmd/macswap_sse.h | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) -- 2.34.1