Hi, Bruce: > -----Original Message----- > From: Richardson, Bruce > Sent: Thursday, December 15, 2016 6:13 PM > To: Yang, Zhiyong <zhiyong.y...@intel.com> > Cc: Ananyev, Konstantin <konstantin.anan...@intel.com>; Thomas > Monjalon <thomas.monja...@6wind.com>; dev@dpdk.org; > yuanhan....@linux.intel.com; De Lara Guarch, Pablo > <pablo.de.lara.gua...@intel.com> > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on > IA platform > > On Thu, Dec 15, 2016 at 06:51:08AM +0000, Yang, Zhiyong wrote: > > Hi, Thomas, Konstantin: > > > > > -----Original Message----- > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yang, Zhiyong > > > Sent: Sunday, December 11, 2016 8:33 PM > > > To: Ananyev, Konstantin <konstantin.anan...@intel.com>; Thomas > > > Monjalon <thomas.monja...@6wind.com> > > > Cc: dev@dpdk.org; yuanhan....@linux.intel.com; Richardson, Bruce > > > <bruce.richard...@intel.com>; De Lara Guarch, Pablo > > > <pablo.de.lara.gua...@intel.com> > > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce > rte_memset > > > on IA platform > > > > > > Hi, Konstantin, Bruce: > > > > > > > -----Original Message----- > > > > From: Ananyev, Konstantin > > > > Sent: Thursday, December 8, 2016 6:31 PM > > > > To: Yang, Zhiyong <zhiyong.y...@intel.com>; Thomas Monjalon > > > > <thomas.monja...@6wind.com> > > > > Cc: dev@dpdk.org; yuanhan....@linux.intel.com; Richardson, Bruce > > > > <bruce.richard...@intel.com>; De Lara Guarch, Pablo > > > > <pablo.de.lara.gua...@intel.com> > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce > > > > rte_memset on IA platform > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yang, Zhiyong > > > > > Sent: Thursday, December 8, 2016 9:53 AM > > > > > To: Ananyev, Konstantin <konstantin.anan...@intel.com>; Thomas > > > > > Monjalon <thomas.monja...@6wind.com> > > > > > Cc: dev@dpdk.org; yuanhan....@linux.intel.com; Richardson, Bruce > > > > > <bruce.richard...@intel.com>; De Lara Guarch, Pablo > > > > > <pablo.de.lara.gua...@intel.com> > > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce > > > > > rte_memset on IA platform > > > > > > > > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n); > > > > > > > > static inline void* > > > > rte_memset_huge(void *s, int c, size_t n) { > > > > return __rte_memset_vector(s, c, n); } > > > > > > > > static inline void * > > > > rte_memset(void *s, int c, size_t n) { > > > > If (n < XXX) > > > > return rte_memset_scalar(s, c, n); > > > > else > > > > return rte_memset_huge(s, c, n); } > > > > > > > > XXX could be either a define, or could also be a variable, so it > > > > can be setuped at startup, depending on the architecture. > > > > > > > > Would that work? > > > > Konstantin > > > > > > I have implemented the code for choosing the functions at run time. > > rte_memcpy is used more frequently, So I test it at run time. > > > > typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, > > size_t n); extern rte_memcpy_vector_t rte_memcpy_vector; static inline > > void * rte_memcpy(void *dst, const void *src, size_t n) { > > return rte_memcpy_vector(dst, src, n); } In order to reduce > > the overhead at run time, I assign the function address to var > > rte_memcpy_vector before main() starts to init the var. > > > > static void __attribute__((constructor)) > > rte_memcpy_init(void) > > { > > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) > > { > > rte_memcpy_vector = rte_memcpy_avx2; > > } > > else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) > > { > > rte_memcpy_vector = rte_memcpy_sse; > > } > > else > > { > > rte_memcpy_vector = memcpy; > > } > > > > } > > I run the same virtio/vhost loopback tests without NIC. > > I can see the throughput drop when running choosing functions at run > > time compared to original code as following on the same platform(my > machine is haswell) > > Packet size perf drop > > 64 -4% > > 256 -5.4% > > 1024 -5% > > 1500 -2.5% > > Another thing, I run the memcpy_perf_autotest, when N= <128, the > > rte_memcpy perf gains almost disappears When choosing functions at run > > time. For N=other numbers, the perf gains will become narrow. > > > How narrow. How significant is the improvement that we gain from having to > maintain our own copy of memcpy. If the libc version is nearly as good we > should just use that. > > /Bruce
Zhihong sent a patch about rte_memcpy, From the patch, we can see the optimization job for memcpy will bring obvious perf improvements than glibc for DPDK. http://www.dpdk.org/dev/patchwork/patch/17753/ git log as following: This patch is tested on Ivy Bridge, Haswell and Skylake, it provides up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging from 64 to 1500 bytes. thanks Zhiyong