https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115466
--- Comment #4 from Carl Love <carll at gcc dot gnu.org> --- comment 1 Yes, I can confirm if I add the alignment statement to the declarations it works fine. I originally tried to use the built-in as part of a re-write of a function in the Milvus AI source code. The function fvec_L2sqr_batch_4_ref() accounts for about 90% of the workload run time. The code is: fvec_L2sqr_batch_4_ref_v1(const float* x, const float* y0, const float* y1, const float* y2, const float* y3, const size_t d, float& dis0, float& dis1, float& dis2, float& dis3) { float d0 = 0; float d1 = 0; float d2 = 0; float d3 = 0; for (size_t i = 0; i < d; ++i) { const float q0 = x[i] - y0[i]; const float q1 = x[i] - y1[i]; const float q2 = x[i] - y2[i]; const float q3 = x[i] - y3[i]; d0 += q0 * q0; d1 += q1 * q1; d2 += q2 * q2; d3 += q3 * q3; } dis0 = d0; dis1 = d1; dis2 = d2; dis3 = d3; } When compiled with -O3, it does generate vsx instructions. But I noticed that it was not using the vsx multiply add instructions. So, I tried rewriting it to explicitly load a vector with the vec_ld built-in, followed by vec_sub and vec_madd. Which by the way gives a 45% reduction in the execution time for my standalone test. At this point, not sure where the arguments get defined so adding the alignment to the declaration is not so easy. That said, Peter mentioned the vec_xl built-in which does seem to work. The vec_xl does not require the data to be aligned from what I see in the PVIPR. comment 3, Segher I was looking at the PVIPR document when I chose the built-in for my rewrite. Looking at the documentation, it does say "Load a 16-byte vector from the memory address specified by the displacement and the pointer, ignoring the low-order bits of the calculated address." In retrospect, I should have picked up on the ignoring of the low-order bits to imply the addresses needed to be aligned. It would really be good if the documentation explicitly said the data must be 16-bye aligned. That said, my bad for not reading/understanding the documentation well enough. The vec_xl documentation does not say anything about ignoring the lower bits of the address. So, in my case that is a better load built-in to use so I don't have to try and find all the declarations for the arrays that could be passed to the function. It would be great to update the PVIPR to be more explicit about the alignment needs. Sorry everyone for the noise. I think we can close the issue as "User error".