Ira, Thank you very much for quick answer. I will check 4.7 x86-64 to see difference from our port. Is there significant change between 4.5 & 4.7 regarding SLP?
Cheers, Bingfeng > -----Original Message----- > From: Ira Rosen [mailto:i...@il.ibm.com] > Sent: 01 November 2011 11:13 > To: Bingfeng Mei > Cc: gcc@gcc.gnu.org > Subject: Re: SLP vectorizer on non-loop? > > > > gcc-ow...@gcc.gnu.org wrote on 01/11/2011 12:41:32 PM: > > > Hello, > > I have one example with two very similar loops. cunrolli pass > > unrolls one loop completely > > but not the other based on slightly different cost estimations. The > > not-unrolled loop > > get SLP-vectorized, then unrolled by "cunroll" pass, whereas the > > other unrolled loop cannot > > be vectorized since it is not a loop any more. In the end, there is > > big difference of > > performance between two loops. > > > > Here what I see with the current trunk on x86_64 with -O3 (with the two > loops split into different functions): > > The first loop, the one that doesn't get unrolled by cunrolli, gets > loop > vectorized with -fno-vect-cost-model. With the cost model the > vectorization > fails because the number of iterations is not sufficient (the > vectorizer > tries to apply loop peeling in order to align the accesses), the loop > gets > later unrolled by cunroll and the basic block gets vectorized by SLP. > > The second loop, unrolled by cunrolli, also gets vectorized by SLP. > > The *.optimized dumps look similar: > > > <bb 2>: > vect_var_.14_48 = MEM[(int *)p_hist_buff_9(D)]; > MEM[(int *)temp_hist_buffer_5(D)] = vect_var_.14_48; > return; > > > <bb 2>: > vect_var_.7_57 = MEM[(int *)p_input_10(D)]; > MEM[(int *)temp_hist_buffer_6(D) + 16B] = vect_var_.7_57; > return; > > > > My question is why SLP vectorization has to be performed on loop (it > > is a sub-pass under > > pass_tree_loop). Conceptually, cannot it be done on any basic block? > > Our port are still > > stuck at 4.5. But I checked 4.7, it seems still the same. I also > > checked functions in > > tree-vect-slp.c. They use a lot of loop_vinfo structures. But in > > some places it checks > > whether loop_vinfo exists to use it or other alternative. I tried to > > add an extra SLP > > pass after pass_tree_loop, but it didn't work. I wonder how easy to > > make SLP works for > > non-loop. > > SLP vectorization works both on loops (in vectorize pass) and on basic > blocks (in slp-vectorize pass). > > Ira > > > > > Thanks, > > Bingfeng Mei > > > > Broadcom UK > > > > void foo (int *__restrict__ temp_hist_buffer, > > int * __restrict__ p_hist_buff, > > int *__restrict__ p_input) > > { > > int i; > > for(i=0;i<4;i++) > > temp_hist_buffer[i]=p_hist_buff[i]; > > > > for(i=0;i<4;i++) > > temp_hist_buffer[i+4]=p_input[i]; > > > > } > > > > >