"mal content" <[EMAIL PROTECTED]> writes: > Apologies if this is the wrong list.
It's the wrong list. This should go to [EMAIL PROTECTED] Please send any followups there. Thanks. > float *vector_add4f(float va[4], const float vb[4]) > { > va[0] += vb[0]; > va[1] += vb[1]; > va[2] += vb[2]; > va[3] += vb[3]; > return va; > } > Using -march=pentium3 -mtune=pentium3m -mfpmath=sse, the following > is generated: It looks like you didn't use -O. But even if you do, you won't get a vector add instruction. It is possible that this function will be called with overlapping pointers. In particular, it is possible that the assignment to va[0] changes vb[1]. Therefore, this code can not be vectorized. Even if you fix that, gcc will only vectorize if you pass the -ftree-vectorize option. And it will only vectorize code in loops. And it unfortunately doesn't do a good job of using movups, so it will mess around with checking the alignment. And there isn't a good way to specify alignment. I do see use of the vector instructions for this example float *vector_add4f(float * __restrict va, float * __restrict vb) { int i; for (i = 0; i < 4; ++i) va[i] += vb[i]; return va; } if I compile with -O2 -ftree-vectorize. Frankly the generated code is really awful, and I wouldn't be surprised if it runs more slowly than the non-vectorized code. This is evidently an area where the compiler could use more work. Ian