On 7/16/2024 7:37 AM, Vipin Varghese wrote: > Goal of the patch is to improve SSE macswap on x86_64 by reducing > the stalls in backend engine. Original implementation of the SSE > macswap makes loop call to multiple load, shuffle & store. Using > SIMD ISA interleaving we can reduce the stalls for > - load SSE token exhaustion > - Shuffle and Load dependency > > Also other changes which improves packet per second are > - Filling access to MBUF for offload flags which is separate cacheline, > - using register keyword > > Build test using meson script: > `````````````````````````````` > > build-gcc-static > buildtools > build-gcc-shared > build-mini > build-clang-static > build-clang-shared > build-x86-generic > > Test Results: > ````````````` > > Platform-1: AMD EPYC SIENA 8594P @2.3GHz, no boost > > ------------------------------------------------ > TEST IO 64B: baseline <NIC : MPPs> > - mellanox CX-7 2*200Gbps : 42.0 > - intel E810 1*100Gbps : 82.0 > - intel E810 2*200Gbps (2CQ-DA2): 82.45 > ------------------------------------------------ > TEST MACSWAP 64B: <NIC : Before : After> > - mellanox CX-7 2*200Gbps : 31.533 : 31.90 > - intel E810 1*100Gbps : 50.380 : 47.0 > - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827 > ------------------------------------------------ > TEST MACSWAP 128B: <NIC : Before: After> > - mellanox CX-7 2*200Gbps: 30.946 : 31.770 > - intel E810 1*100Gbps: 49.386 : 46.366 > - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503 > ------------------------------------------------ > TEST MACSWAP 256B: <NIC: Before: After> > - mellanox CX-7 2*200Gbps: 32.480 : 33.150 > - intel E810 1 * 100Gbps: 45.29 : 44.571 > - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117 > ------------------------------------------------ > > Platform-2: AMD EPYC 9554 @3.1GHz, no boost > > ------------------------------------------------ > TEST IO 64B: baseline <NIC : MPPs> > - intel E810 2*200Gbps (2CQ-DA2): 82.49 > ------------------------------------------------ > <NIC intel E810 2*200Gbps (2CQ-DA2): Before : After> > TEST MACSWAP: 1Q 1C1T > 64B: : 45.0 : 45.54 > 128B: : 44.48 : 44.43 > 256B: : 42.0 : 41.99 > +++++++++++++++++++++++++ > TEST MACSWAP: 2Q 2C2T > 64B: : 59.5 : 60.55 > 128B: : 56.78 : 58.1 > 256B: : 41.85 : 41.99 > ------------------------------------------------ > > Signed-off-by: Vipin Varghese <vipin.vargh...@amd.com> >
Hi Bruce, John, Can you please help testing macswap performance with this patch on Intel platforms, to be sure it is not causing regression? Other option is to get this patch for -rc3 and tested there, with the condition to remove it in any regression, if this help testing the patch? Thanks, ferruh