> -----Original Message----- > From: EDMISON, Kelvin (Kelvin) [mailto:kelvin.edmison at alcatel-lucent.com] > Sent: Thursday, January 29, 2015 5:48 AM > To: Wang, Zhihong; Stephen Hemminger; Neil Horman > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > > > On 2015-01-27, 3:22 AM, "Wang, Zhihong" <zhihong.wang at intel.com> wrote: > > > > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of EDMISON, > Kelvin > >> (Kelvin) > >> Sent: Friday, January 23, 2015 2:22 AM > >> To: dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > >> > >> > >> > >> On 2015-01-21, 3:54 PM, "Neil Horman" <nhorman at tuxdriver.com> > wrote: > >> > >> >On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote: > >> >> On Wed, 21 Jan 2015 13:26:20 +0000 Bruce Richardson > >> >> <bruce.richardson at intel.com> wrote: > >> >> > [..trim...] > >> >> One issue I have is that as a vendor we need to ship on binary, > >> >>not different distributions for each Intel chip variant. There is > >> >>some support for multi-chip version functions but only in latest > >> >>Gcc which isn't in Debian stable. And the > >>multi-chip > >> >>version > >> >> of functions is going to be more expensive than inlining. For some > >> >>cases, I have seen that the overhead of fancy instructions looks > >> >>good but have > >>nasty > >> >>side effects > >> >> like CPU stall and/or increased power consumption which turns of > >>turbo > >> >>boost. > >> >> > >> >> > >> >> Distro's in general have the same problem with special case > >> >>optimizations. > >> >> > >> >What we really need is to do something like borrow the alternatives > >> >mechanism from the kernel so that we can dynamically replace > >> >instructions at run time based on cpu flags. That way we could make > >> >the choice at run time, and wouldn't have to do alot of special case > >> >jumping about. > >> >Neil > >> > >> +1. > >> > >> I think it should be an anti-requirement that the build machine be > >> the exact same chip as the deployment platform. > >> > >> I like the cpu flag inspection approach. It would help in the case > >>where DPDK is in a VM and an odd set of CPU flags have been exposed. > >> > >> If that approach doesn't work though, then perhaps DPDK memcpy could > >>go through a benchmarking at app startup time and select the most > >>performant option out of a set, like mdraid's raid6 implementation > >>does. To give an example, this is what my systems print out at boot > >>time re: raid6 algorithm selection. > >> raid6: sse2x1 3171 MB/s > >> raid6: sse2x2 3925 MB/s > >> raid6: sse2x4 4523 MB/s > >> raid6: using algorithm sse2x4 (4523 MB/s) > >> > >> Regards, > >> Kelvin > >> > > > >Thanks for the proposal! > > > >For DPDK, performance is always the most important concern. We need to > >utilize new architecture features to achieve that, so solution per arch > >is necessary. > >Even a few extra cycles can lead to bad performance if they're in a hot > >loop. > >For instance, let's assume DPDK takes 60 cycles to process a packet on > >average, then 3 more cycles here means 5% performance drop. > > > >The dynamic solution is doable but with performance penalties, even if > >it could be small. Also it may bring extra complexity, which can lead > >to unpredictable behaviors and side effects. > >For example, the dynamic solution won't have inline unrolling, which > >can bring significant performance benefit for small copies with > >constant length, like eth_addr. > > > >We can investigate the VM scenario more. > > > >Zhihong (John) > > John, > > Thanks for taking the time to answer my newbie question. I deeply > appreciate the attention paid to performance in DPDK. I have a follow-up > though. > > I'm trying to figure out what requirements this approach creates for the > software build environment. If we want to build optimized versions for > Haswell, Ivy Bridge, Sandy Bridge, etc, does this mean that we must have one > of each micro-architecture available for running the builds, or is there a way > of cross-compiling for all micro-architectures from just one build > environment? > > Thanks, > Kelvin >
I'm not an expert in this, just some facts based on my test: The compile process depends on the compiler and the lib version. So even on a machine that doesn't support the necessary ISA, it still should compile as long as gcc & glibc & etc have the support, only you'll get "Illegal instruction" trying launching the compiled binary. Therefore if there's a way (worst case scenario: change flags manually) to make DPDK build process think that it's on a Haswell machine, it will produce Haswell binaries. Zhihong (John)