Am 23.07.2013 19:08, schrieb Andre Heider: > For AVX it's not sufficient to only rely on the cpuid flags. If the CPU > supports these extensions, but the OS doesn't, issuing these insns will > trigger an undefined opcode exception. > > In addition to the AVX cpuid bit we also need to: > * test cpuid for OSXSAVE support > * XGETBV to check if the OS saves/restores AVX regs on context switches > > See "Detecting Availability and Support" at > http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions > > Signed-off-by: Andre Heider <a.hei...@gmail.com> > --- > src/gallium/auxiliary/util/u_cpu_detect.c | 27 +++++++++++++++++++++++++-- > 1 file changed, 25 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c > b/src/gallium/auxiliary/util/u_cpu_detect.c > index b118fc8..588fc7c 100644 > --- a/src/gallium/auxiliary/util/u_cpu_detect.c > +++ b/src/gallium/auxiliary/util/u_cpu_detect.c > @@ -67,7 +67,7 @@ > > #if defined(PIPE_OS_WINDOWS) > #include <windows.h> > -#if defined(MSVC) > +#if defined(PIPE_CC_MSVC) > #include <intrin.h> > #endif > #endif > @@ -211,6 +211,27 @@ cpuid(uint32_t ax, uint32_t *p) > p[3] = 0; > #endif > } > + > +static INLINE uint64_t xgetbv(void) > +{ > +#if defined(PIPE_CC_GCC) > + uint32_t eax, edx; > + > + __asm __volatile ( > + ".byte 0x0f, 0x01, 0xd0" // xgetbv isn't supported on gcc < 4.4 > + : "=a"(eax), > + "=d"(edx) > + : "c"(0) > + ); > + > + return ((uint64_t)edx << 32) | eax; > +#elif defined(PIPE_CC_MSVC) && defined(_MSC_FULL_VER) && > defined(_XCR_XFEATURE_ENABLED_MASK) > + return _xgetbv(_XCR_XFEATURE_ENABLED_MASK); > +#else > + return 0; > +#endif > + > +} > #endif /* X86 or X86_64 */ > > void > @@ -284,7 +305,9 @@ util_cpu_detect(void) > util_cpu_caps.has_sse4_1 = (regs2[2] >> 19) & 1; > util_cpu_caps.has_sse4_2 = (regs2[2] >> 20) & 1; > util_cpu_caps.has_popcnt = (regs2[2] >> 23) & 1; > - util_cpu_caps.has_avx = (regs2[2] >> 28) & 1; > + util_cpu_caps.has_avx = ((regs2[2] >> 28) & 1) && // AVX > + ((regs2[2] >> 27) & 1) && // OSXSAVE > + ((xgetbv() & 6) == 6); // XMM & YMM > util_cpu_caps.has_f16c = (regs2[2] >> 29) & 1; > util_cpu_caps.has_mmx2 = util_cpu_caps.has_sse; /* SSE cpus > supports mmxext too */ > >
Looks good to me though it's a pity detection depends on compiler. Granted it looks like icc currently won't work but still... I guess that technically the test for sse(x) isn't correct neither as that too requires OS support, I don't know off-hand though how to check for it (and we'd be talking ANCIENT os here...). Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev