http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14741

--- Comment #26 from Evgeniy Dushistov <dushistov at mail dot ru> ---
I try such simple C++ function, compiled in separate object file(-march=native
-Ofast): 

void mult(const double * const __restrict__ A, const double * const
__restrict__ B, double * const __restrict__ C, const size_t N)
{
    for (size_t j = 0; j < N; ++j)
        for (size_t i = 0; i < N; ++i)
            for (size_t k = 0; k < N; ++k)
                C[i * N + j] += A[i * N + k] + B[k * N + j];
}

$ time ./test_gcc 
204.800000

real    0m9.628s
user    0m9.620s
sys     0m0.000s

$ time ./test_icc 
204.800000

real    0m0.637s
user    0m0.630s
sys     0m0.000s


Difference 15.2 times

Looks like the difference here:
GCC:
Analyzing loop at mult.cpp:5
Analyzing loop at mult.cpp:6
Analyzing loop at mult.cpp:7

mult.cpp:3: note: vectorized 0 loops in function.

ICC:
mult.cpp(5): (col. 2) remark: PERMUTED LOOP WAS VECTORIZED.
mult.cpp(5): (col. 2) remark: PERMUTED LOOP WAS VECTORIZED.
mult.cpp(5): (col. 2) remark: PERMUTED LOOP WAS VECTORIZED.

Reply via email to