Kyrylo Tkachov via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi all, > > While experimenting with some backend costs for Advanced SIMD and SVE I hit > many cases where GCC would pick SVE for VLA auto-vectorisation even when the > backend very clearly presented cheaper costs for Advanced SIMD. > For a simple float addition loop the SVE costs were: > > vec.c:9:21: note: Cost model analysis: > Vector inside of loop cost: 28 > Vector prologue cost: 2 > Vector epilogue cost: 0 > Scalar iteration cost: 10 > Scalar outside cost: 0 > Vector outside cost: 2 > prologue iterations: 0 > epilogue iterations: 0 > Minimum number of vector iterations: 1 > Calculated minimum iters for profitability: 4 > > and for Advanced SIMD (Neon) they're: > > vec.c:9:21: note: Cost model analysis: > Vector inside of loop cost: 11 > Vector prologue cost: 0 > Vector epilogue cost: 0 > Scalar iteration cost: 10 > Scalar outside cost: 0 > Vector outside cost: 0 > prologue iterations: 0 > epilogue iterations: 0 > Calculated minimum iters for profitability: 0 > vec.c:9:21: note: Runtime profitability threshold = 4
Just to expand on this for others on the list: this is comparing SVE with an estimated VL of 256 bits with Advanced SIMD at 128 bits, so for 8 floats it's 28 vs 22. For generic SVE we'd justify using VLA in that situation because the gap is relatively small and SVE would (according to the cost model) be a clear win beyond 256 bits. But in this case, the 256-bit VL estimate comes directly from a -mcpu/-mtune option, so it is more definite than the usual estimates for generic SVE. What happens at larger VL is pretty much irrevelant in this case. > yet the SVE one was always picked. With guidance from Richard this seems to > be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular the > part with the big comment explaining the > estimated_rel_new * 2 <= estimated_rel_old heuristic. > > This patch extends the comparisons by introducing a three-way estimate > kind for poly_int values that the backend can distinguish. > This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely > estimates and pick Advanced SIMD overs SVE when it is clearly cheaper. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Manually checked that with reasonable separate costs for Advanced SIMD and SVE > GCC picks up one over the other in ways I'd expect. > > Ok for trunk? > Thanks, > Kyrill > > gcc/ > * target.h (enum poly_value_estimate_kind): Define. > (estimated_poly_value): Take an estimate kind argument. > * target.def (estimated_poly_value): Update definition for the above. > * doc/tm.texi: Regenerate. > * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely > estimates of VF to pick between vinfos. > * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use > estimated_poly_value instead of aarch64_estimated_poly_value. > (aarch64_estimated_poly_value): Take a kind argument and handle it. OK, thanks. Richard