On 01/09/2018 06:32 AM, Bruce Richardson wrote: > This patch adds an AVX2 vectorized path to the i40e driver, based on the > existing SSE4.2 version. Using AVX2 instructions gives better performance > than the SSE version, though the percentage increase depends on the exact > settings used. For example: >
Hi Bruce, Just curious, can you provide some hints on percent increase in at least some representative cases? I'm just trying to get a sense of if this is %5, 10%, 20%, more... I know mileage will vary depending on system, setup, configuration, etc. Thanks, John > * Using 16B rather than 32B descriptors gives the biggest benefit since > 2 descriptors at a time can be read, rather than just 1 when 32B ones > are used. > * Bigger burst sizes for RX gives improved performance - while we see an > improvement with testpmd with the default burst size of 32, burst sizes > of up to 128 give further improvements > * In my testing, most of the improvement comes from faster processing on > the RX path, though the improved TX also gives benefit. > > This has been tested on a system with CPU: "Intel(R) Xeon(R) Gold 6154 CPU > @ 3.00GHz", and I've focused on testing with Rx ring sizes of approx 1k - > generally --rxd=1024 and --txd=512, rather than the defaults which tend to > give poorer zero-loss performance due to the smaller amount of buffering. > > V2: > * Fixed incorrect config variable reference in makefile > * Added missing stub function for when vector drivers are disabled > * Added missing references to the new functions when checking for vector > code paths, e.g. for ring tear-down > > Bruce Richardson (2): > net/i40e: add AVX2 Tx function > net/i40e: add AVX2 Rx function > > drivers/net/i40e/Makefile | 19 + > drivers/net/i40e/i40e_rxtx.c | 66 ++- > drivers/net/i40e/i40e_rxtx.h | 6 + > drivers/net/i40e/i40e_rxtx_vec_avx2.c | 792 > ++++++++++++++++++++++++++++++++++ > 4 files changed, 880 insertions(+), 3 deletions(-) > create mode 100644 drivers/net/i40e/i40e_rxtx_vec_avx2.c >