[dpdk-dev] [PATCH v3 00/10] split architecture specific operations
David, I acked the set of patches. Thanks a lot! Best Regards! -- Chao Zhu From: David Marchand To: dev at dpdk.org Cc: Chao CH Zhu/China/IBM at IBMCN Date: 2014/10/28 20:50 Subject:[PATCH v3 00/10] split architecture specific operations The set of patches split x86 architecture specific operations from DPDK and put them to x86 arch directory. This will make the adoption of DPDK much easier on other computer architecture. For a new architecture, just add an architecture specific directory and necessary building configuration files, then DPDK eal library can support it. Reviewing patchset from Chao, I ended up modifying it along the way, so here is a new iteration of this patchset. Changes since Chao v2 patchset : - added a preliminary patch for moving rte_atomic.h (for better readability) - fixed documentation generation - implemented a generic header for each arch specific header (cpuflags, memcpy, prefetch were missing) - removed C++ stuff from generic headers - centralised all doxygen stuff in generic headers (no need to have duplicates) - refactored rte_cycles functions - moved vmware tsc stuff to arch rte_cycles.h headers - finished x86 factorisation Little summary of current state : - all applications continue to include the eal headers as before, these headers are the arch-specific ones - the arch specific headers always include the generic ones. The generic headers contain the doxygen documentation and code common to all architectures - a x86 architecture has been defined which handles both 32bits and 64bits peculiarities It builds fine for 32/64 bits (w and w/o "force intrinsics"), but I really would like a lot of eyes on this (and I would say, especially, rte_cycles, rte_memcpy and rte_cpuflags). I still have some concerns about the use of intrinsics for architecture != x86 but I think Chao will be the best to look at this. -- David Marchand Chao Zhu (7): eal: split atomic operations to architecture specific eal: split byte order operations to architecture specific eal: split CPU cycle operation to architecture specific eal: split prefetch operations to architecture specific eal: split spinlock operations to architecture specific eal: split memcpy operation to architecture specific eal: split CPU flags operations to architecture specific David Marchand (3): eal: move rte_atomic.h header eal: install all arch headers eal: factorize x86 headers doc/api/doxy-api.conf |1 + lib/librte_eal/common/Makefile | 24 +- lib/librte_eal/common/eal_common_cpuflags.c| 190 .../common/include/arch/x86/rte_atomic.h | 216 .../common/include/arch/x86/rte_atomic_32.h| 222 .../common/include/arch/x86/rte_atomic_64.h| 191 .../common/include/arch/x86/rte_byteorder.h| 121 +++ .../common/include/arch/x86/rte_byteorder_32.h | 51 + .../common/include/arch/x86/rte_byteorder_64.h | 52 + .../common/include/arch/x86/rte_cpuflags.h | 310 ++ .../common/include/arch/x86/rte_cycles.h | 121 +++ .../common/include/arch/x86/rte_memcpy.h | 297 + .../common/include/arch/x86/rte_prefetch.h | 62 ++ .../common/include/arch/x86/rte_spinlock.h | 94 ++ lib/librte_eal/common/include/generic/rte_atomic.h | 918 .../common/include/generic/rte_byteorder.h | 189 .../common/include/generic/rte_cpuflags.h | 110 ++ lib/librte_eal/common/include/generic/rte_cycles.h | 209 lib/librte_eal/common/include/generic/rte_memcpy.h | 144 +++ .../common/include/generic/rte_prefetch.h | 71 ++ .../common/include/generic/rte_spinlock.h | 226 .../common/include/i686/arch/rte_atomic.h | 373 --- lib/librte_eal/common/include/rte_atomic.h | 1133 lib/librte_eal/common/include/rte_byteorder.h | 270 - lib/librte_eal/common/include/rte_cpuflags.h | 182 lib/librte_eal/common/include/rte_cycles.h | 266 - lib/librte_eal/common/include/rte_memcpy.h | 376 --- lib/librte_eal/common/include/rte_prefetch.h | 88 -- lib/librte_eal/common/include/rte_spinlock.h | 258 - .../common/include/x86_64/arch/rte_atomic.h| 335 -- mk/arch/i686/rte.vars.mk |2 + mk/arch/x86_64/rte.vars.mk |2 + 32 files changed, 3624 insertions(+), 3480 deletions(-) create mode 100644 lib/librte_eal/common/include/arch/x86/rte_atomic.h create mode 100644 lib/librte_eal/common/include/arch/x86/rte_atomic_32.h create mode 100644 lib/librte_eal/common/include/arch/x86/rte_atomic_64.h create mode 100644 lib/librte_eal/common/include/arch/x86/rte_byteorder.h create mode 100644 lib/librte_eal/common/includ
[dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture
Hi, Hemant Actually, I submitted another set of patches to split the architecture specific operations which includes the patch to librte_eal\common\include\rte_atomic.h. Please refer to the previous email. Best Regards! -- Chao Zhu (??) Research Staff Member Cloud Infrastructure and Technology Group IBM China Research Lab Building 19 Zhongguancun Software Park 8 Dongbeiwang West Road, Haidian District, Beijing, PRC. 100193 Tel: +86-10-58748711 Email: bjzhuc at cn.ibm.com From: "Hemant at freescale.com" To: Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" Date: 2014/09/29 14:15 Subject:RE: [dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture Hi Chao, This Patch seems to be incomplete. You may also need to patch the librte_eal\common\include\rte_atomic.h e.g. #if !(defined RTE_ARCH_X86_64) || !(defined RTE_ARCH_I686) #include #else /* if Intel*/ Otherwise you shall be getting compilation errors for "_mm_mfence" Similar is true for other common header files as well. Regards, Hemant > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chao Zhu > Sent: 26/Sep/2014 3:06 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power > architecture > > The atomic operations implemented with assembly code in DPDK only support > x86. This patch add architecture specific atomic operations for IBM Power > architecture. > > Signed-off-by: Chao Zhu > --- > .../common/include/powerpc/arch/rte_atomic.h | 387 > > .../common/include/powerpc/arch/rte_atomic_arch.h | 318 > > 2 files changed, 705 insertions(+), 0 deletions(-) create mode 100644 > lib/librte_eal/common/include/powerpc/arch/rte_atomic.h > create mode 100644 > lib/librte_eal/common/include/powerpc/arch/rte_atomic_arch.h > > diff --git a/lib/librte_eal/common/include/powerpc/arch/rte_atomic.h > b/lib/librte_eal/common/include/powerpc/arch/rte_atomic.h > new file mode 100644 > index 000..7f5214e > --- /dev/null > +++ b/lib/librte_eal/common/include/powerpc/arch/rte_atomic.h > @@ -0,0 +1,387 @@ > +/* > + * BSD LICENSE > + * > + * Copyright (C) IBM Corporation 2014. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of IBM Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT > NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS > OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF > THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > DAMAGE. > +*/ > + > +/* > + * Inspired from FreeBSD src/sys/powerpc/include/atomic.h > + * Copyright (c) 2008 Marcel Moolenaar > + * Copyright (c) 2001 Benno Rice > + * Copyright (c) 2001 David E. O'Brien > + * Copyright (c) 1998 Doug Rabson > + * All rights reserved. > + */ > + > +#ifndef _RTE_ATOMIC_H_ > +#error "don't include this file directly, please include generic " > +#endif > + > +#ifndef _RTE_POWERPC_64_ATOMIC_H_ > +#define _RTE_POWERPC_64_ATOMIC_H_ > + > +/*- 64 bit atomic operations > +-*/ > + > +/** > + * An atomic compare and set function used by the mutex functions. > + * (atomic) equivalent to: > + * if (*dst == exp) > + * *dst = src (a
[dpdk-dev] [PATCH 10/12] Add cache size define for IBM Power Architecture
Hi,Hemant, Actually, the set of patches is only for IBM Power7/8 which has difference cache line size. Of cause, a better way may be detecting the cache line size at runtime not from configuration files... May be we can submit this kind of patch later. Best Regards! -- Chao Zhu (??) Research Staff Member Cloud Infrastructure and Technology Group IBM China Research Lab Building 19 Zhongguancun Software Park 8 Dongbeiwang West Road, Haidian District, Beijing, PRC. 100193 Tel: +86-10-58748711 Email: bjzhuc at cn.ibm.com From: "Hemant at freescale.com" To: Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" Date: 2014/09/29 14:20 Subject:RE: [dpdk-dev] [PATCH 10/12] Add cache size define for IBM Power Architecture > --- a/mk/arch/powerpc/rte.vars.mk > +++ b/mk/arch/powerpc/rte.vars.mk > @@ -32,7 +32,7 @@ > ARCH ?= powerpc > CROSS ?= > > -CPU_CFLAGS ?= -m64 > +CPU_CFLAGS ?= -m64 -DCACHE_LINE_SIZE=128 [hemant] Instead of hardcoding the CACHE_LINE_SIZE, can you drive the CACHE_LINE_SIZE from config file. Other powerpc processor have it as 64. > CPU_LDFLAGS ?= > CPU_ASFLAGS ?= -felf64 > > -- > 1.7.1
[dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific
Bruce and Neil, Thanks for your comments! Actually, the compiler hides the difference with different architecture. I'll submit another patch to correct this! Best Regards! -- Chao Zhu (??) Research Staff Member Cloud Infrastructure and Technology Group IBM China Research Lab Building 19 Zhongguancun Software Park 8 Dongbeiwang West Road, Haidian District, Beijing, PRC. 100193 Tel: +86-10-58748711 Email: bjzhuc at cn.ibm.com From: Neil Horman To: Bruce Richardson Cc: Chao CH Zhu/China/IBM at IBMCN, dev at dpdk.org Date: 2014/09/29 23:23 Subject:Re: [dpdk-dev] [PATCH 1/7] Split atomic operations to architecture specific On Mon, Sep 29, 2014 at 12:05:22PM +0100, Bruce Richardson wrote: > On Fri, Sep 26, 2014 at 05:33:32AM -0400, Chao Zhu wrote: > > This patch splits the atomic operations from DPDK and push them to > > architecture specific arch directories, so that other processor > > architecture to support DPDK can be easily adopted. > > > > Signed-off-by: Chao Zhu > > --- > > lib/librte_eal/common/Makefile |2 +- > > .../common/include/i686/arch/rte_atomic_arch.h | 378 > > lib/librte_eal/common/include/rte_atomic.h | 172 + > > .../common/include/x86_64/arch/rte_atomic_arch.h | 378 > > 4 files changed, 772 insertions(+), 158 deletions(-) > > create mode 100644 lib/librte_eal/common/include/i686/arch/rte_atomic_arch.h > > create mode 100644 lib/librte_eal/common/include/x86_64/arch/rte_atomic_arch.h > > > <...snip...> > > +#define rte_compiler_barrier() rte_arch_compiler_barrier() > > Small question: shouldn't the compiler barrier be independent of > architecture? > Agreed, compiler intrinsics I thought were used to define barriers, regardless of arch (__memory_barrier() is the gcc intrinsic IIRC) Neil > /Bruce > >
[dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
Cyril, Thanks for your comments! You are right. SSE needs to be splited. The current split is not a completed one. I'll continue to contribute. Best Regards! -- Chao Zhu (??) Research Staff Member Cloud Infrastructure and Technology Group IBM China Research Lab Building 19 Zhongguancun Software Park 8 Dongbeiwang West Road, Haidian District, Beijing, PRC. 100193 Tel: +86-10-58748711 Email: bjzhuc at cn.ibm.com From: Cyril Chemparathy To: Chao CH Zhu/China/IBM at IBMCN, Date: 2014/10/07 05:39 Subject:Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK On 9/26/2014 2:33 AM, Chao Zhu wrote: > The set of patches split x86 architecture specific operations from DPDK and put them to the > arch directories of i686 and x86_64 architecture. This will make the adpotion of DPDK much easier > on other computer architecture. For a new architecture, just add an architecture specific > directory and necessary building configuration files, then DPDK can support it. Wouldn't the SSE specifics in rte_common.h and rte_common_vect.h need to be similarly split out into architecture specifics? Thanks -- Cyril.
[dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
David, I agree that your idea may be better for the splitting. However, as Bruce said, I think people would like to see the multi-architecture support feature of DPDK first. We can improve it gradually. Do you have some comments? Best Regards! -- Chao Zhu From: Bruce Richardson To: David Marchand Cc: Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" Date: 2014/10/03 21:28 Subject:Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK On Fri, Oct 03, 2014 at 03:21:53PM +0200, David Marchand wrote: > Hello Chao, > > On Fri, Sep 26, 2014 at 11:33 AM, Chao Zhu wrote: > > > The set of patches split x86 architecture specific operations from DPDK > > and put them to the > > arch directories of i686 and x86_64 architecture. This will make the > > adpotion of DPDK much easier > > on other computer architecture. For a new architecture, just add an > > architecture specific > > directory and necessary building configuration files, then DPDK can > > support it. > > > > > Here is a different approach for the headers splitting. > > If we are going to support multiple architectures, the best would be to > have a specific header for each arch which implements a common API (no need > for any _arch suffix). > These headers would be located in lib/librte_eal/common/include/arch/$arch/ > rather than lib/librte_eal/common/include/$arch/arch/ (which looks odd to > me). > Makefiles can add some -I for dpdk to build itself (and we can remove those > symlinks from the makefiles). > Makefiles only install the specific headers in RTE_SDK/include for use by > applications. > > For common code and documentation, we can add a "generic" directory in > lib/librte_eal/common/include (or "arch-generic", or "shared" ... any > better idea ?). > DPDK makefiles installs the generic headers in RTE_SDK/include/generic. > arch headers (like rte_atomic.h) include the generic one > (). > > These generic headers can be implemented using compiler intrinsics when > possible. > They also include the doxygen stuff in a single place. > > > This would look like something like this, for rte_atomic.h : > - in DPDK sources > $ ls lib/librte_eal/common/include/*/rte_atomic.h > lib/librte_eal/common/include/i686/rte_atomic.h > lib/librte_eal/common/include/x86_64/rte_atomic.h > lib/librte_eal/common/include/generic/rte_atomic.h > > - in installed RTE_SDK > $ ls RTE_SDK/include/{,*/}rte_atomic.h > RTE_SDK/include/rte_atomic.h > RTE_SDK/include/generic/rte_atomic.h > > Comments ? > > > I am only focusing on the first patchset at the moment, but if we can find > consensus here, a respin of the two patchsets would be great. > > Thanks. > > -- > David Marchand I would have no objection to such a scheme. However, I'm not seeing much advantage over the existing way of doing things. I think I'd rather see the proposed patch sets merged first and then any additional cleanup done, rather than holding up a worthwhile submission for a bit of tidy-up. /Bruce
[dpdk-dev] [PATCH 09/12] Remove iopl operation for IBM Power architecture
OK. I'll update the patches. Thanks for your comments! Best Regards! -- Chao Zhu From: "Ananyev, Konstantin" To: Cyril Chemparathy , Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" Date: 2014/10/07 22:45 Subject:RE: [dpdk-dev] [PATCH 09/12] Remove iopl operation for IBM Power architecture > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cyril Chemparathy > Sent: Monday, October 06, 2014 11:04 PM > To: Chao Zhu; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 09/12] Remove iopl operation for IBM Power architecture > > On 9/26/2014 2:36 AM, Chao Zhu wrote: > > iopl() call is mostly for the i386 architecture. In Power architecture. > > It doesn't exist. This patch modified rte_eal_iopl_init() and make it > > return -1 on Power. This means rte_config.flags will not contain > > EAL_FLG_HIGH_IOPL flag on IBM Power architecture. > > Since iopl() is an x86-only thing, shouldn't the code be conditional on > defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686) instead of below? > > Better still, should we maybe break out an architecture specific init > function? This function could set iopl on x86, and possibly do other > lowlevel init things on other architectures... Yep, that sounds like a good way to me too. > > > Signed-off-by: Chao Zhu > > --- > > lib/librte_eal/linuxapp/eal/eal.c | 11 +++ > > 1 files changed, 11 insertions(+), 0 deletions(-) > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c > > index 4869e7c..8cc1f21 100644 > > --- a/lib/librte_eal/linuxapp/eal/eal.c > > +++ b/lib/librte_eal/linuxapp/eal/eal.c > > @@ -50,7 +50,10 @@ > > #include > > #include > > #include > > +/* Power architecture doesn't have this header file */ > > +#ifndef RTE_ARCH_PPC_64 > > #include > > +#endif > > > > #include > > #include > > @@ -1019,11 +1022,19 @@ rte_eal_mcfg_complete(void) > > > > /* > >* Request iopl privilege for all RPL, returns 0 on success > > + * > > + * Power architecture doesn't have iopl function, so this function > > + * return -1 on Power architecture, because this function is only used > > + * in rte_eal_init to add EAL_FLG_HIGH_IOPL to rte_config.flags. > >*/ > > static int > > rte_eal_iopl_init(void) > > { > > +#ifndef RTE_ARCH_PPC_64 > > return iopl(HIGHEST_RPL); > > +#else > > +return -1; > > +#endif > > } > > > > /* Launch threads, called at application init(). */
[dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
David, I'll update the patches acccording to your comments. Thanks! Best Regards! -- Chao Zhu From: David Marchand To: Chao CH Zhu/China/IBM at IBMCN Cc: "dev at dpdk.org" Date: 2014/10/03 21:21 Subject:Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK Hello Chao, On Fri, Sep 26, 2014 at 11:33 AM, Chao Zhu wrote: The set of patches split x86 architecture specific operations from DPDK and put them to the arch directories of i686 and x86_64 architecture. This will make the adpotion of DPDK much easier on other computer architecture. For a new architecture, just add an architecture specific directory and necessary building configuration files, then DPDK can support it. Here is a different approach for the headers splitting. If we are going to support multiple architectures, the best would be to have a specific header for each arch which implements a common API (no need for any _arch suffix). These headers would be located in lib/librte_eal/common/include/arch/$arch/ rather than lib/librte_eal/common/include/$arch/arch/ (which looks odd to me). Makefiles can add some -I for dpdk to build itself (and we can remove those symlinks from the makefiles). Makefiles only install the specific headers in RTE_SDK/include for use by applications. For common code and documentation, we can add a "generic" directory in lib/librte_eal/common/include (or "arch-generic", or "shared" ... any better idea ?). DPDK makefiles installs the generic headers in RTE_SDK/include/generic. arch headers (like rte_atomic.h) include the generic one (). These generic headers can be implemented using compiler intrinsics when possible. They also include the doxygen stuff in a single place. This would look like something like this, for rte_atomic.h : - in DPDK sources $ ls lib/librte_eal/common/include/*/rte_atomic.h lib/librte_eal/common/include/i686/rte_atomic.h lib/librte_eal/common/include/x86_64/rte_atomic.h lib/librte_eal/common/include/generic/rte_atomic.h - in installed RTE_SDK $ ls RTE_SDK/include/{,*/}rte_atomic.h RTE_SDK/include/rte_atomic.h RTE_SDK/include/generic/rte_atomic.h Comments ? I am only focusing on the first patchset at the moment, but if we can find consensus here, a respin of the two patchsets would be great. Thanks. -- David Marchand
[dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture
Konstantin, In my understanding, compiler barrier is a kind of software barrier which prevents the compiler from moving memory accesses across the barrier. This should be architecture-independent. And the "sync" instruction is a hardware barrier which depends on PowerPC architecture. So I think the compiler barrier should be the same on x86 and PowerPC. Any comments? Please correct me if I was wrong. Thanks a lot! Best Regards! -- Chao Zhu From: "Ananyev, Konstantin" To: Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" Date: 2014/10/16 08:38 Subject:RE: [dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture Hi, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chao Zhu > Sent: Friday, September 26, 2014 10:36 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture > > The atomic operations implemented with assembly code in DPDK only > support x86. This patch add architecture specific atomic operations for > IBM Power architecture. > > Signed-off-by: Chao Zhu > --- > .../common/include/powerpc/arch/rte_atomic.h | 387 > .../common/include/powerpc/arch/rte_atomic_arch.h | 318 > 2 files changed, 705 insertions(+), 0 deletions(-) > create mode 100644 lib/librte_eal/common/include/powerpc/arch/rte_atomic.h > create mode 100644 lib/librte_eal/common/include/powerpc/arch/rte_atomic_arch.h > ... > + > diff --git a/lib/librte_eal/common/include/powerpc/arch/rte_atomic_arch.h > b/lib/librte_eal/common/include/powerpc/arch/rte_atomic_arch.h > new file mode 100644 > index 000..fe5666e > --- /dev/null > + ... >+#definerte_arch_rmb() asm volatile("sync" : : : "memory") >+ > +#define rte_arch_compiler_barrier() do {\ > + asm volatile ("" : : : "memory"); \ > +} while(0) I don't know much about PPC architecture, but as I remember it uses a weakly-ordering memory model. Is that correct? If so, then you probably need rte_arch_compiler_barrier() to be "sync" instruction (like mb()s above) . The reason is that IA has much stronger memory ordering model and there are a lot of places in the code where it implies that ordering. For example - ring enqueue/dequeue functions. Konstantin