On Mon, 19 Jan 2015 09:53:34 +0800 zhihong.wang at intel.com wrote: > Main code changes: > > 1. Differentiate architectural features based on CPU flags > > a. Implement separated move functions for SSE/AVX/AVX2 to make full > utilization of cache bandwidth > > b. Implement separated copy flow specifically optimized for target > architecture > > 2. Rewrite the memcpy function "rte_memcpy" > > a. Add store aligning > > b. Add load aligning based on architectural features > > c. Put block copy loop into inline move functions for better control of > instruction order > > d. Eliminate unnecessary MOVs > > 3. Rewrite the inline move functions > > a. Add move functions for unaligned load cases > > b. Change instruction order in copy loops for better pipeline utilization > > c. Use intrinsics instead of assembly code > > 4. Remove slow glibc call for constant copies > > Signed-off-by: Zhihong Wang <zhihong.wang at intel.com>
Dumb question: why not fix glibc memcpy instead? What is special about rte_memcpy?