Hi all
After further investigating, we have found some benefits with the patchset.
So the plan is to add a config parameter CONFIG_RTE_ENABLE_RUNTIME_DISPATCH.
By default, the value is "n" and would use current memcpy codes.
Only if users config it to "y", it would use the run-time dispatch 
codes(without inline).


Best Regards,
Xiaoyun Li




> -----Original Message-----
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Li, Xiaoyun
> Sent: Tuesday, September 12, 2017 10:27
> To: Wang, Liang-min <liang-min.w...@intel.com>; Richardson, Bruce
> <bruce.richard...@intel.com>; Ananyev, Konstantin
> <konstantin.anan...@intel.com>
> Cc: Zhang, Qi Z <qi.z.zh...@intel.com>; Lu, Wenzhuo
> <wenzhuo...@intel.com>; Zhang, Helin <helin.zh...@intel.com>;
> pie...@emutex.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/x86: run-time dispatch over
> memcpy
> 
> Hi ALL
> 
> After investigating, most DPDK codes are already run-time dispatching. Only
> rte_memcpy chooses the ISA at build-time.
> 
> To modify memcpy, there are two ways. The first one is function pointers
> and another is function multi-versioning in GCC.
> 
> But memcpy has been greatly optimized and gets benefit from total inline. If
> changing it to run-time dispatching via function pointers, the perf will drop 
> a
> lot especially when copy size is small.
> 
> And function multi-versioning in GCC only works for C++. Even if it is said 
> that
> GCC6 can support C, but in fact it does not support C in my trial.
> 
> 
> 
> The attachment is the perf results of memcpy with and without my patch and
> original DPDK codes but without inline.
> 
> It's just for comparison, so right now, I only tested on Broadwell, using 
> AVX2.
> 
> The results are from running test/test/test_memcpy_perf.c.
> 
> (C = compile-time constant)
> 
> /* Do aligned tests where size is a variable */
> 
> /* Do aligned tests where size is a compile-time constant */
> 
> /* Do unaligned tests where size is a variable */
> 
> /* Do unaligned tests where size is a compile-time constant */
> 
> 
> 
> 4-7 means dpdk costs time 4 and glibc costs time 7
> 
> For size smaller than 128 bytes. This patch's perf is bad and even worse than
> glibc.
> 
> When size grows, the perf is better than glibc but worse than original dpdk.
> 
> And when grows above about 1024 bytes, it performs similarly to original
> dpdk.
> 
> Furthermore, if delete inline in original dpdk, the perf are similar to the 
> perf
> with patch.
> 
> Different situations(4 types, such as cache to cache) perform differently but
> the trend is the same (size grows, perf grows).
> 
> 
> 
> So if needs dynamic, needs sacrifices some perf and needs to compile for the
> minimum target (e.g. compile for target avx, run on avx, avx2, avx512f).
> 
> 
> 
> Thus, I think this feature shouldn't be delivered in this release.
> 
> 
> 
> Best Regards,
> 
> Xiaoyun Li

Reply via email to