https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70686

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 18 Apr 2016, alekshs at hotmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70686
> 
> --- Comment #2 from alekshs at hotmail dot com ---
> (In reply to Richard Biener from comment #1)
> > It's not so mind-blowing - it's simply that -fprofile-generate makes our
> > GIMPLE level if-conversion no longer apply.  Without -fprofile-generate
> > we if-convert the loop into
> > 
> >  for (i = 1; i <100000001; i++) 
> >  {
> >  ...
> >     
> >    b = b + (b < 1.00001) ? i + 12.43 : 0.0; 
> > ...
> > }
> > 
> > thus we always evaluate the i + 12.43 and one additional addition of zero.
> > 
> > We do this to eventually enable vectorization but without any check
> > on whether it would be profitable when not vectorizing (your testcase
> > shows it's not profitable).
> > 
> > Confirmed.  -fno-tree-loop-if-convert should fix it in this particular case.
> 
> Aha, thanks for the swift reply.
> 
> Regarding profitability, I should note that the PGO misses entirely the fact
> that 20 mulsd could become 10 mulpd:
> 
> 
>   400560:       f2 0f 59 e9             mulsd  %xmm1,%xmm5
>   400564:       f2 0f 59 e1             mulsd  %xmm1,%xmm4
>   400568:       f2 0f 59 d9             mulsd  %xmm1,%xmm3
>   40056c:       f2 0f 59 d1             mulsd  %xmm1,%xmm2
>   400570:       f2 0f 59 e9             mulsd  %xmm1,%xmm5
>   400574:       f2 0f 59 e1             mulsd  %xmm1,%xmm4
>   400578:       f2 0f 59 d9             mulsd  %xmm1,%xmm3
>   40057c:       f2 0f 59 d1             mulsd  %xmm1,%xmm2
>   400580:       f2 0f 59 e9             mulsd  %xmm1,%xmm5
>   400584:       f2 0f 59 e1             mulsd  %xmm1,%xmm4
>   400588:       f2 0f 59 d9             mulsd  %xmm1,%xmm3
>   40058c:       f2 0f 59 d1             mulsd  %xmm1,%xmm2
>   400590:       f2 0f 59 e9             mulsd  %xmm1,%xmm5
>   400594:       f2 0f 59 e1             mulsd  %xmm1,%xmm4
>   400598:       f2 0f 59 d9             mulsd  %xmm1,%xmm3
>   40059c:       f2 0f 59 d1             mulsd  %xmm1,%xmm2
>   4005a0:       f2 0f 59 e9             mulsd  %xmm1,%xmm5
>   4005a4:       f2 0f 59 e1             mulsd  %xmm1,%xmm4
>   4005a8:       f2 0f 59 d9             mulsd  %xmm1,%xmm3
>   4005ac:       f2 0f 59 d1             mulsd  %xmm1,%xmm2
> 
> 
> ...So there was job to be done there. That's at -03 -march=native btw (to
> preserve accuracy, unlike -Ofast). Ofast too doesn't pack them. It kind of
> splits to scalar muls and packed adds.

vectorization is confused by you computing a reduction that is broken
by the if ().  This isn't easily vectorized.

Reply via email to