> Hopefully someone from AMD will provide tests that are mysteriously missing 
> from XOP testsuite.

As pointed out by Marc, I added myself to the bug later. 
I was bit confused about the "internal insn representation" with "user-visible 
function".
So, couldn't add test then and there. I could have solved that earlier. Sorry 
for that.

Attached is the test that checks the (controversial) "frcz" functions.

Uros could you please add this to your patch while committing.

Regards
Ganesh

-----Original Message-----
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Saturday, November 23, 2013 6:49 PM
To: gcc-patches@gcc.gnu.org
Cc: Cong Hou; Marc Glisse; Gopalasubramanian, Ganesh
Subject: [PATCH, i386]: Fix PR56788, _mm_frcz_sd and _mm_frcz_ss ignore their 
second argument

Hello!

Attached patch fixes PR56788, where _mm_frcz_{ss,sd} intrinsics ignored their 
second argument.

As explained in the PR [1], gcc implements two-operand "vector-merge"
form as documented in Microsoft's definition [2]. However, in contrast to other 
SSE scalar insns, the instruction itself clears upper bits to zero.

There were a couple of problems: the builtin was declared as builtin with two 
input operands, but the number of input operands didn't correspond to referred 
insn pattern, leaving its second operand uninitialized. The intrinsic was also 
implemented without necessary movss/movsd fixup that would merge both its 
operands in a correct way.

Please also note that the definition in clang is wrong.

I didn't include any testcase in the patch, since I don't have access to XOP 
target. Hopefully someone from AMD will provide tests that are mysteriously 
missing from XOP testsuite.

2013-11-23  Uros Bizjak  <ubiz...@gmail.com>

    PR target/56788
    * config/i386/i386.c (bdesc_multi_arg) <IX86_BUILTIN_VFRCZSS>:
    Declare as MULTI_ARG_1_SF instruction.
    <IX86_BUILTIN_VFRCZSD>: Decleare as MULTI_ARG_1_DF instruction.
    * config/i386/sse.md (*xop_vmfrcz<mode>2): Rename
    from *xop_vmfrcz_<mode>.
    * config/i386/xopintrin.h (_mm_frcz_ss): Use __builtin_ia32_movss
    to merge scalar result with __A.
    (_mm_frcz_sd): Use __builtin_ia32_movsd to merge scalar
    result with __A.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

The patch was committed to mainline SVN and will be committed to other release 
branches in a couple of days (hopefully with additional tests).

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56788
[2] http://msdn.microsoft.com/en-us/library/vstudio/gg445126%28v=vs.100%29.aspx

Uros.
#include <x86intrin.h>
#include "m128-check.h"

void
check_mm_vmfrcz_sd (__m128d __A, __m128d __B)
{
  union128d a, b, c;
  double d[2];

  a.x = __A;
  b.x = __B;
  c.x = _mm_frcz_sd (__A, __B);
  d[0] = b.a[0] - (int)b.a[0] ;
  d[1] = a.a[1];
  if (check_union128d (c, d))
    abort ();
}

void
check_mm_vmfrcz_ss (__m128 __A, __m128 __B)
{
  union128 a, b, c;
  float f[4];

  a.x = __A;
  b.x = __B;
  c.x = _mm_frcz_ss (__A, __B);
  f[0] = b.a[0] - (int)b.a[0] ;
  f[1] = a.a[1];
  f[2] = a.a[2];
  f[3] = a.a[3];
  if (check_union128 (c, f))
    abort ();
}

void
main (void)
{
  union128 a, b;
  union128d c,d;
  int i;

  for (i = 0; i < 4; i++)
    {
       a.a[i] = i + 3.5;
       b.a[i] = i + 7.9;
    }
  for (i = 0; i < 2; i++)
    {
       c.a[i] = i + 3.5;
       d.a[i] = i + 7.987654321;
    }
  check_mm_vmfrcz_ss (a.x, b.x);
  check_mm_vmfrcz_sd (c.x, d.x);
}

Reply via email to