Ira,
Thank you very much for quick answer. I will check 4.7 x86-64 
to see difference from our port. Is there significant change
between 4.5 & 4.7 regarding SLP? 

Cheers,
Bingfeng

> -----Original Message-----
> From: Ira Rosen [mailto:i...@il.ibm.com]
> Sent: 01 November 2011 11:13
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: SLP vectorizer on non-loop?
> 
> 
> 
> gcc-ow...@gcc.gnu.org wrote on 01/11/2011 12:41:32 PM:
> 
> > Hello,
> > I have one example with two very similar loops. cunrolli pass
> > unrolls one loop completely
> > but not the other based on slightly different cost estimations. The
> > not-unrolled loop
> > get SLP-vectorized, then unrolled by "cunroll" pass, whereas the
> > other unrolled loop cannot
> > be vectorized since it is not a loop any more.  In the end, there is
> > big difference of
> > performance between two loops.
> >
> 
> Here what I see with the current trunk on x86_64 with -O3 (with the two
> loops split into different functions):
> 
> The first loop, the one that doesn't get unrolled by cunrolli, gets
> loop
> vectorized with -fno-vect-cost-model. With the cost model the
> vectorization
> fails because the number of iterations is not sufficient (the
> vectorizer
> tries to apply loop peeling in order to align the accesses), the loop
> gets
> later unrolled by cunroll and the basic block gets vectorized by SLP.
> 
> The second loop, unrolled by cunrolli, also gets vectorized by SLP.
> 
> The *.optimized dumps look similar:
> 
> 
> <bb 2>:
>   vect_var_.14_48 = MEM[(int *)p_hist_buff_9(D)];
>   MEM[(int *)temp_hist_buffer_5(D)] = vect_var_.14_48;
>   return;
> 
> 
> <bb 2>:
>   vect_var_.7_57 = MEM[(int *)p_input_10(D)];
>   MEM[(int *)temp_hist_buffer_6(D) + 16B] = vect_var_.7_57;
>   return;
> 
> 
> > My question is why SLP vectorization has to be performed on loop (it
> > is a sub-pass under
> > pass_tree_loop). Conceptually, cannot it be done on any basic block?
> > Our port are still
> > stuck at 4.5. But I checked 4.7, it seems still the same. I also
> > checked functions in
> > tree-vect-slp.c. They use a lot of loop_vinfo structures. But in
> > some places it checks
> > whether loop_vinfo exists to use it or other alternative. I tried to
> > add an extra SLP
> > pass after pass_tree_loop, but it didn't work. I wonder how easy to
> > make SLP works for
> > non-loop.
> 
> SLP vectorization works both on loops (in vectorize pass) and on basic
> blocks (in slp-vectorize pass).
> 
> Ira
> 
> >
> > Thanks,
> > Bingfeng Mei
> >
> > Broadcom UK
> >
> > void foo (int *__restrict__ temp_hist_buffer,
> >           int * __restrict__ p_hist_buff,
> >           int *__restrict__ p_input)
> > {
> >   int i;
> >   for(i=0;i<4;i++)
> >      temp_hist_buffer[i]=p_hist_buff[i];
> >
> >   for(i=0;i<4;i++)
> >      temp_hist_buffer[i+4]=p_input[i];
> >
> > }
> >
> >
> 


Reply via email to