Hi, I'm trying to studing the automatic vectorization optimization in GCC, but I found one case that SLP vectorizer failed to do such things.
Here is the sample code: (also a simplification version of a function from the 625/525.x264 source code in SPEC CPU 2017) void pixel_sub_wxh(int16_t *diff, uint8_t *pix1, uint8_t *pix2) { for (int y = 0; y < 4; y++) { for (int x = 0; x < 4; x++) diff[x + y * 4] = pix1[x] - pix2[x]; pix1 += 16; pix2 += 32; } } When I compiled with `-O3 -mavx2/-msse4.2`, SLP vectorizer failed to vectorize it, and I got the following message when adding `-fopt-info-vec-all`. (The inner loop will be unrolled) <source>:6:21: optimized: loop vectorized using 8 byte vectors <source>:6:21: optimized: loop versioned for vectorization because of possible aliasing <source>:5:6: note: vectorized 1 loops in function. <source>:5:6: note: ***** Analysis failed with vector mode V8SI <source>:5:6: note: ***** The result for vector mode V32QI would be the same <source>:5:6: note: ***** Re-trying analysis with vector mode V16QI <source>:5:6: note: ***** Analysis failed with vector mode V16QI <source>:5:6: note: ***** Re-trying analysis with vector mode V8QI <source>:5:6: note: ***** Analysis failed with vector mode V8QI <source>:5:6: note: ***** Re-trying analysis with vector mode V4QI <source>:5:6: note: ***** Analysis failed with vector mode V4QI If I manually use the type declaration provided by `immintrin.h` to rewrite the code, the code is as follows (which I hope the SLP vectorizer to be able to do) void pixel_sub_wxh_vec(int16_t *diff, uint8_t *pix1, uint8_t *pix2) { for (int y = 0; y < 4; y++) { __v4hi pix1_v = {pix1[0], pix1[1], pix1[2], pix1[3]}; __v4hi pix2_v = {pix2[0], pix2[1], pix2[2], pix2[3]}; __v4hi diff_v = pix1_v - pix2_v; *(long long *)(diff + y * 4) = (long long)diff_v; pix1 += 16; pix2 += 32; } } What I want to know is why SLP vectorizer can't vectorize the code here, and what changes do I need to make to SLP vectorizer or the source code if I want it to do so? Thanks Hanke Zhang