On Sun, Dec 13, 2015 at 10:34 PM, Matt Turner <matts...@gmail.com> wrote: > On Sun, Dec 13, 2015 at 5:23 AM, Oded Gabbay <oded.gab...@gmail.com> wrote: >> On Sun, Dec 13, 2015 at 11:56 AM, Jonathan Gray <j...@jsg.id.au> wrote: >>> On Sat, Dec 12, 2015 at 06:41:56PM +0000, Emil Velikov wrote: >>>> On 10 December 2015 at 08:42, Oded Gabbay <oded.gab...@gmail.com> wrote: >>>> > On Wed, Dec 9, 2015 at 8:30 PM, Matt Turner <matts...@gmail.com> wrote: >>>> >> On Tue, Dec 8, 2015 at 9:37 PM, Jonathan Gray <j...@jsg.id.au> wrote: >>>> >>> Change the __m128i variables to be volatile so gcc 4.9 won't optimise >>>> >>> all of them out with -O1 or greater. The _mm_set1_epi32/pinsrd calls >>>> >>> still get optimised out but now there is at least one SSE4.1 >>>> >>> instruction >>>> >>> generated via _mm_max_epu32/pmaxud. When all of the sse4.1 >>>> >>> instructions >>>> >>> got optimised out the configure test would incorrectly pass when the >>>> >>> compiler supported the intrinsics and the assembler didn't support the >>>> >>> instructions. >>>> >>> >>>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806 >>>> >>> Signed-off-by: Jonathan Gray <j...@jsg.id.au> >>>> >>> Cc: "11.0 11.1" <mesa-sta...@lists.freedesktop.org> >>>> >>> --- >>>> >>> configure.ac | 2 +- >>>> >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>> >>>> >>> diff --git a/configure.ac b/configure.ac >>>> >>> index 260934d..1d82e47 100644 >>>> >>> --- a/configure.ac >>>> >>> +++ b/configure.ac >>>> >>> @@ -384,7 +384,7 @@ CFLAGS="$SSE41_CFLAGS $CFLAGS" >>>> >>> AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ >>>> >>> #include <smmintrin.h> >>>> >>> int main () { >>>> >>> - __m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c; >>>> >>> + volatile __m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), >>>> >>> c; >>>> >>> c = _mm_max_epu32(a, b); >>>> >>> return 0; >>>> >> >>>> >> I would have extracted an int from the result of _mm_max_epu32 and >>>> >> returned that instead of 0. >>>> > >>>> > Instead of the volatile I assume ? >>>> > >>>> Precisely. If anyone wants to follow on Matt's suggestion we can pick >>>> that one as well. I'd like to get a patch for the next stable releases >>>> (next Friday for 11.0.x and just after new year for 11.1.1) so I'll >>>> take whatever's around :-) >>>> >>>> -Emil >>> >>> I avoided that as I wasn't sure if there was a case where autoconf >>> cared about the return code. If someone wants to create a new diff >>> feel free, I have limited connectivity till the middle of next week. >> >> So I'm not a huge SSE expert, but I tried doing this (remove volatile >> and return _mm_cvtsi128_si32 of c): >> >> ------------------------ >> #include <mmintrin.h> >> #include <xmmintrin.h> >> #include <emmintrin.h> >> >> int main () { >> __m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c; >> c = _mm_xor_si128 (a, b); >> return _mm_cvtsi128_si32(c); >> } >> ------------------------- >> >> When compiling with "gcc -O1 -msse2", gcc is 4.8.5 (from RHEL 7.2), I got: >> >> --------------------- >> main: >> .LFB521: >> .cfi_startproc >> movl $0, %eax >> ret >> .cfi_endproc >> ------------------- >> >> So unless I misunderstood matt's suggestion, I think we *have* to use >> the volatile as it forces the compiler to produce pxor and movdqa >> assembly commands. > > Since all the arguments to the intrinsics are constants, GCC is > constant-evaluating them. > > I expect all you'd need to do is pass some global variables to the > intrinsics or similar.
ok, so what helped was this: int param; int main () { __m128i a = _mm_set1_epi32 (param), b = _mm_set1_epi32 (param+1), c; Notice the (param+1) - if using just (param), the compiler will optimize it. And it is quite understandable, as xoring a value with itself gives 0. Oded _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev