Hi, > > Hi > > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Tuesday, October 3, 2017 00:39 > > To: Li, Xiaoyun <xiaoyun...@intel.com>; Richardson, Bruce > > <bruce.richard...@intel.com> > > Cc: Lu, Wenzhuo <wenzhuo...@intel.com>; Zhang, Helin > > <helin.zh...@intel.com>; dev@dpdk.org > > Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > > > > > -----Original Message----- > > > From: Li, Xiaoyun > > > Sent: Monday, October 2, 2017 5:13 PM > > > To: Ananyev, Konstantin <konstantin.anan...@intel.com>; Richardson, > > Bruce <bruce.richard...@intel.com> > > > Cc: Lu, Wenzhuo <wenzhuo...@intel.com>; Zhang, Helin > > <helin.zh...@intel.com>; dev@dpdk.org; Li, Xiaoyun <xiaoyun...@intel.com> > > > Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > > This patch dynamically selects functions of memcpy at run-time based > > > on CPU flags that current machine supports. This patch uses function > > > pointers which are bind to the relative functions at constrctor time. > > > In addition, AVX512 instructions set would be compiled only if users > > > config it enabled and the compiler supports it. > > > > > > Signed-off-by: Xiaoyun Li <xiaoyun...@intel.com> > > > --- > > > v2 > > > * Use gcc function multi-versioning to avoid compilation issues. > > > * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the > > > compiler supports it, the AVX512 codes would be compiled. Only if the > > > compiler supports AVX2, the AVX2 codes would be compiled. > > > > > > v3 > > > * Reduce function calls via only keep rte_memcpy_xxx. > > > * Add conditions that when copy size is small, use inline code path. > > > Otherwise, use dynamic code path. > > > * To support attribute target, clang version must be greater than 3.7. > > > Otherwise, would choose SSE/AVX code path, the same as before. > > > * Move two mocro functions to the top of the code since they would be > > > used in inline SSE/AVX and dynamic SSE/AVX codes. > > > > > > v4 > > > * Modify rte_memcpy.h to several .c files and modify makefiles to compile > > > AVX2 and AVX512 files. > > > > Could you explain to me why instead of reusing existing rte_memcpy() code > > to generate _sse/_avx2/ax512f flavors you keep pushing changes with 3 > > separate implementations? > > Obviously that is much more expensive in terms of maintenance and doesn't > > look like > > feasible solution to me. > > Is existing rte_memcpy() implementation is not good enough in terms of > > functionality and/or performance? > > If so, can you outline these problems and try to fix them first. > > Konstantin > > > > I just change many small functions to one function in those 3 separate > functions.
Yes, so with what you suggest we'll have 4 implementations for rte_memcpy to support. That's very expensive terms of maintenance and I believe totally unnecessary. > Because the existing codes are totally inline, including rte_memcpy() itself. > So the compilation will > change all rte_memcpy() calls into the basic codes like xmm0=xxx. > > The existing codes in this way are OK. Good. >But when run-time, it will bring lots of function calls > and cause perf drop. I believe it wouldn't if we do it properly. All internal functions (mov16, mov32, etc.) will still be unlined by the compiler for each flavor (sse/avx2/etc.) - have a look at the patch I sent. Konstantin