[dpdk-dev] [PATCH] A fix to work around strict-aliasing rules breaking

2015-03-02 Thread zhihong.w...@intel.com
Fixed strict-aliasing rules breaking errors for some GCC version. Signed-off-by: Zhihong Wang --- .../common/include/arch/x86/rte_memcpy.h | 44 -- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h b

[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-19 Thread zhihong.w...@intel.com
Main code changes: 1. Differentiate architectural features based on CPU flags a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth b. Implement separated copy flow specifically optimized for target architecture 2. Rewrite the memcpy functi

[dpdk-dev] [PATCH 3/4] app/test: Extended test coverage in test_memcpy_perf.c

2015-01-19 Thread zhihong.w...@intel.com
Main code changes: 1. Added more typical data points for a thorough performance test 2. Added unaligned test cases since it's common in DPDK usage Signed-off-by: Zhihong Wang --- app/test/test_memcpy_perf.c | 238 +--- 1 file changed, 156 insertions(+),

[dpdk-dev] [PATCH 2/4] app/test: Removed unnecessary test cases in test_memcpy.c

2015-01-19 Thread zhihong.w...@intel.com
Removed unnecessary test cases for base move functions since the function "func_test" covers them all. Signed-off-by: Zhihong Wang --- app/test/test_memcpy.c | 52 +- 1 file changed, 1 insertion(+), 51 deletions(-) diff --git a/app/test/test_memc

[dpdk-dev] [PATCH 1/4] app/test: Disabled VTA for memcpy test in app/test/Makefile

2015-01-19 Thread zhihong.w...@intel.com
VTA is for debugging only, it increases compile time and binary size, especially when there're a lot of inlines. So disable it since memcpy test contains a lot of inline calls. Signed-off-by: Zhihong Wang --- app/test/Makefile | 6 ++ 1 file changed, 6 insertions(+) diff --git a/app/test/M

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-19 Thread zhihong.w...@intel.com
This patch set optimizes memcpy for DPDK for both SSE and AVX platforms. It also extends memcpy test coverage with unaligned cases and more test points. Optimization techniques are summarized below: 1. Utilize full cache bandwidth 2. Enforce aligned stores 3. Apply load address alignment based