Hello,
I have one example with two very similar loops. cunrolli pass unrolls one loop 
completely
but not the other based on slightly different cost estimations. The 
not-unrolled loop 
get SLP-vectorized, then unrolled by "cunroll" pass, whereas the other unrolled 
loop cannot
be vectorized since it is not a loop any more.  In the end, there is big 
difference of
performance between two loops. 

My question is why SLP vectorization has to be performed on loop (it is a 
sub-pass under
pass_tree_loop). Conceptually, cannot it be done on any basic block? Our port 
are still
stuck at 4.5. But I checked 4.7, it seems still the same. I also checked 
functions in 
tree-vect-slp.c. They use a lot of loop_vinfo structures. But in some places it 
checks
whether loop_vinfo exists to use it or other alternative. I tried to add an 
extra SLP 
pass after pass_tree_loop, but it didn't work. I wonder how easy to make SLP 
works for 
non-loop.

Thanks,
Bingfeng Mei

Broadcom UK

void foo (int *__restrict__ temp_hist_buffer, 
          int * __restrict__ p_hist_buff, 
          int *__restrict__ p_input)
{
  int i;
  for(i=0;i<4;i++)
     temp_hist_buffer[i]=p_hist_buff[i];

  for(i=0;i<4;i++)
     temp_hist_buffer[i+4]=p_input[i];

}


Reply via email to