http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935



Richard Biener <rguenth at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

             Status|ASSIGNED                    |WAITING



--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> 2013-04-15 
14:38:40 UTC ---

Reduced testcase:



typedef struct {

  long int x;

  long int y;

} S;

void foo (S *s)

{

    s->x--;

    s->y--;

}



Difference in cost model analysis:



before:



t.c:7: note: Cost model analysis:

  Vector inside of basic block cost: 5

  Vector prologue cost: 0

  Vector epilogue cost: 0

  Scalar cost of basic block: 6



after:



t.c:7: note: Cost model analysis:

  Vector inside of basic block cost: 5

  Vector prologue cost: 1

  Vector epilogue cost: 0

  Scalar cost of basic block: 6



after is more correct, as we need to synthesize the { 1, 1 } vector.

what isn't really optimal is the unchanged vector inside cost.

It's an unaligned load with cost 2, the vector operation with cost 1

and the unaligned store with cost 2.



Before we generated



        pcmpeqd %xmm0, %xmm0

        movdqu  (%rdi), %xmm1

        paddq   %xmm1, %xmm0

        movdqu  %xmm0, (%rdi)

        ret



and afterwards



        subq    $1, (%rdi)

        subq    $1, 8(%rdi)



I'd say it's obvious that the non-vectorized variant is better.



So, are you sure _this_ basic-block is really the issue?

Reply via email to