multiplication not vectorized

wschmidt at gcc dot gnu.org Thu, 27 Aug 2015 13:56:42 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021


--- Comment #22 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #21)
> (In reply to Bill Schmidt from comment #20)

...<snip>...
> 
> I see it only failing due to cost issues (tried ppc64le and -mcpu=power8).
> The unaligned loads cost 3 and we end up with
> 
> t.f90:8:0: note: Cost model analysis:
>   Vector inside of loop cost: 40
>   Vector prologue cost: 8
>   Vector epilogue cost: 4
>   Scalar iteration cost: 12
>   Scalar outside cost: 6
>   Vector outside cost: 12
>   prologue iterations: 0
>   epilogue iterations: 0
> t.f90:8:0: note: cost model: the vector iteration cost = 40 divided by the
> scalar iteration cost = 12 is greater or equal to the vectorization factor =
> 1.
> 
> Note that we are (still) not very good in estimating the SLP cost as we
> account 4 vector loads here (because we essentially will end up with
> 4 different permutations used), so the "unaligned" part is accounted for
> too much and likely the permutation cost as well.  Both are a limitation
> of the SLP data structures and not easily fixable.  With
> -fvect-cost-model=unlimited I see both loops vectorized.

Yes, I get these same results for the loop vectorizer (using -O2
-ftree-vectorize -mcpu=power8 -ffast-math).  But I was looking at the failure
to do SLP vectorization.  In comment 19 you indicated this was now working,
presumably on x86, but for Power we fail to SLP-vectorize
fast-math-pr37021.f90:9:0.

However, with today's trunk my SLP dump looks slightly different so I need to
have another look at whether this is still failing due to alignment or
something else.  I'll comment again when I've dug into it further.

[Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized

Reply via email to