Hello, I have one example with two very similar loops. cunrolli pass unrolls one loop completely but not the other based on slightly different cost estimations. The not-unrolled loop get SLP-vectorized, then unrolled by "cunroll" pass, whereas the other unrolled loop cannot be vectorized since it is not a loop any more. In the end, there is big difference of performance between two loops.
My question is why SLP vectorization has to be performed on loop (it is a sub-pass under pass_tree_loop). Conceptually, cannot it be done on any basic block? Our port are still stuck at 4.5. But I checked 4.7, it seems still the same. I also checked functions in tree-vect-slp.c. They use a lot of loop_vinfo structures. But in some places it checks whether loop_vinfo exists to use it or other alternative. I tried to add an extra SLP pass after pass_tree_loop, but it didn't work. I wonder how easy to make SLP works for non-loop. Thanks, Bingfeng Mei Broadcom UK void foo (int *__restrict__ temp_hist_buffer, int * __restrict__ p_hist_buff, int *__restrict__ p_input) { int i; for(i=0;i<4;i++) temp_hist_buffer[i]=p_hist_buff[i]; for(i=0;i<4;i++) temp_hist_buffer[i+4]=p_input[i]; }