> On Fri, Dec 16, 2016 at 10:19:43AM +0000, Yang, Zhiyong wrote: > > > > I run the same virtio/vhost loopback tests without NIC. > > > > I can see the throughput drop when running choosing functions at run > > > > time compared to original code as following on the same platform(my > > > machine is haswell) > > > > Packet size perf drop > > > > 64 -4% > > > > 256 -5.4% > > > > 1024 -5% > > > > 1500 -2.5% > > > > Another thing, I run the memcpy_perf_autotest, when N= <128, the > > > > rte_memcpy perf gains almost disappears When choosing functions at > run > > > > time. For N=other numbers, the perf gains will become narrow. > > > > > > > How narrow. How significant is the improvement that we gain from > having to > > > maintain our own copy of memcpy. If the libc version is nearly as good we > > > should just use that. > > > > > > /Bruce > > > > Zhihong sent a patch about rte_memcpy, From the patch, > > we can see the optimization job for memcpy will bring obvious perf > improvements > > than glibc for DPDK. > > Just a clarification: it's better than the __original DPDK__ rte_memcpy > but not the glibc one. That makes me think have any one tested the memcpy > with big packets? Does the one from DPDK outweigh the one from glibc, > even for big packets? > > --yliu > I have test the loopback performanc rte_memcpy and glibc memcpy. For both small packer and Big packet, rte_memcpy has better performance. My test enviromen is following CPU: BDW Ubutnu16.04 Kernal: 4.4.0 gcc : 5.4.0 Path: mergeable Size rte_memcpy performance gain 64 31% 128 35% 260 27% 520 33% 1024 18% 1500 12%
--Lei > > http://www.dpdk.org/dev/patchwork/patch/17753/ > > git log as following: > > This patch is tested on Ivy Bridge, Haswell and Skylake, it provides > > up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging > > from 64 to 1500 bytes. > > > > thanks > > Zhiyong