[Bug tree-optimization/50713] SLP vs loop: code generated differs (SLP less efficient)

vincenzo.innocente at cern dot ch Thu, 25 Oct 2012 06:19:51 -0700


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50713




vincenzo Innocente <vincenzo.innocente at cern dot ch> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

            Summary|SLP vs loop: code generated |SLP vs loop: code generated

                   |differs                     |differs (SLP less

                   |                            |efficient)



--- Comment #5 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 
2012-10-25 13:19:23 UTC ---

I think that here the problem is similar

using 4.8 trunk



in the following code the explicit  loop is way more efficient that the vector

notation

for all corei7, corei7-avx and bdver1 architectures.

(i spare you the assembler dumps. Inferring the number of instructions from nm

should enough

0000000000000170 T dfmav(double __vector, double __vector, double __vector,

double, double)

0000000000000370 T dfmal(double __vector, double __vector, double __vector,

double, double)

00000000000003c0 T dfma8v(double __vector, double __vector, double __vector,

double, double)

0000000000000810 T dfma8l(double __vector, double __vector, double __vector,

double, double)

00000000000008b0 short EH_frame1

)





typedef double __attribute__( ( vector_size( 32 ) ) ) float64x4_t;

typedef double __attribute__( ( vector_size( 64 ) ) ) float64x8_t;

float64x4_t dfmav(float64x4_t x, float64x4_t y, float64x4_t z, double a, double

b) {

    return a*x*(y+b*z); 

}



float64x4_t dfmal(float64x4_t x, float64x4_t y, float64x4_t z, double a, double

b) {

        float64x4_t r;

        for (int i=0; i!=4;++i)

          r[i] = a*x[i]*(y[i]+b*z[i]);

       return r;

}



float64x8_t dfma8v(float64x8_t x, float64x8_t y, float64x8_t z, double a,

double b) {

        return a*x*(y+b*z);

}



float64x8_t dfma8l(float64x8_t x, float64x8_t y, float64x8_t z, double a,

double b) {

        float64x8_t r;

        for (int i=0; i!=8;++i)

          r[i] = a*x[i]*(y[i]+b*z[i]);

       return r;

}

[Bug tree-optimization/50713] SLP vs loop: code generated differs (SLP less efficient)

Reply via email to