------- Comment #4 from pinskia at gcc dot gnu dot org  2007-03-22 01:38 -------
Inline version:
  r.dst[0].i = MEM[base: d];
  D.6423 = r.dst[0].i;
  D.6449 = __builtin_ia32_paddusb128 (VIEW_CONVERT_EXPR<__v16qi>(D.6423),
VIEW_CONVERT_EXPR<__v16qi>(D.6423));
  r.dst[0].i = VIEW_CONVERT_EXPR<__m128i>(D.6449);
  __builtin_ia32_movntdq ((__m128i *) d, r.dst[0].i);
  d = d + 16B;


macro:
  D.6414 = MEM[base: d];
  D.6435 = __builtin_ia32_paddusb128 (VIEW_CONVERT_EXPR<__v16qi>(D.6414),
VIEW_CONVERT_EXPR<__v16qi>(D.6414));
  __builtin_ia32_movntdq ((__m128i *) d, VIEW_CONVERT_EXPR<__m128i>(D.6435));
  d = d + 16B;

So somehow r.dst[0].i is not being optimized correctly, I did not look into why
really.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|x86_64-redhat-linux         |
   GCC host triplet|x86_64-redhat-linux         |
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31307

Reply via email to