On Wed, Jul 30, 2014 at 11:59:03AM -0700, Bruce Richardson wrote:
> On Tue, Jul 29, 2014 at 04:24:24PM -0400, Neil Horman wrote:
> > Hey all-
> >         I've been trying to update the fedora dpdk package to support VFIO 
> > enabled drivers and ran into a problem in which ixgbe didn't compile 
> > because the 
> > rxtx_vec code uses sse4.2 instruction intrinsics, which aren't supported in 
> > the 
> > default config I have.  I tried to remedy this by replacing the intrinsics 
> > with 
> > the __builtin macros, but it was pointed out (correctly), that this doesn't 
> > work 
> > properly.  So this is my second attempt, which I actually like a bit 
> > better.  I 
> > noted that code that uses intrinsics (ixgbe and the acl library), don't 
> > need to 
> > have those instructions turned on build-wide.  Rather, we can just enable 
> > the 
> > instructions in the specific code we want to build with support for that, 
> > and 
> > test for instruction support dynamically at run time.  This allows me to 
> > build 
> > the dpdk for a generic platform, but in such a way that some optimizations 
> > can 
> > be used if the executing cpu supports them at run time.
> > 
> > Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
> > CC: Thomas Monjalon <thomas.monjalon at 6wind.com>
> >
> I'd prefer if a solution could be found based off your original patch
> set, as it gives us more chance to deprecate the older code paths in
> future. Looking at the Intel Intrinsics Guide site online, it shows that
> the _mm_shuffle_epi8 intrinsic came in with SSSE3, rather than SSE4.x,
> and so should be available on all 64-bit systems, I believe. The
> popcount intrinsic is newer, but it's a much more basic instruction so
> hopefully the __builtin should work for that.
> 
Yes, but as I look at it, thats somewhat counter to my goal, which is to offer
accelerated code paths on systems that can make use of it at run time.  If We
use the __builtin compiler functions, we will either:

1) Build those code paths with advanced instructions that won't work on older
systems (i.e. crash)

2) Build those code paths with less advanced instructions, meaning that we won't
speedup execution on systems that are capable of using the more advanced
instructions.

Using this run time check, we can, at least in these situations, make use of the
accelerated paths when the instructions are available, and ignore them when
they're not, at run time.

What would be ideal, would be an alternative type macro, like the linux kernel
employs, but implementing that would require some pretty significant work and
testing.  This seems like a much simpler approach.

Neil

> Regards,
> /Bruce
> 

Reply via email to