On Thu, Dec 17, 2020 at 6:16 AM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Kyrylo Tkachov via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > Hi all, > > > > While experimenting with some backend costs for Advanced SIMD and SVE I hit > > many cases where GCC would pick SVE for VLA auto-vectorisation even when the > > backend very clearly presented cheaper costs for Advanced SIMD. > > For a simple float addition loop the SVE costs were: > > > > vec.c:9:21: note: Cost model analysis: > > Vector inside of loop cost: 28 > > Vector prologue cost: 2 > > Vector epilogue cost: 0 > > Scalar iteration cost: 10 > > Scalar outside cost: 0 > > Vector outside cost: 2 > > prologue iterations: 0 > > epilogue iterations: 0 > > Minimum number of vector iterations: 1 > > Calculated minimum iters for profitability: 4 > > > > and for Advanced SIMD (Neon) they're: > > > > vec.c:9:21: note: Cost model analysis: > > Vector inside of loop cost: 11 > > Vector prologue cost: 0 > > Vector epilogue cost: 0 > > Scalar iteration cost: 10 > > Scalar outside cost: 0 > > Vector outside cost: 0 > > prologue iterations: 0 > > epilogue iterations: 0 > > Calculated minimum iters for profitability: 0 > > vec.c:9:21: note: Runtime profitability threshold = 4 > > Just to expand on this for others on the list: this is comparing > SVE with an estimated VL of 256 bits with Advanced SIMD at 128 bits, > so for 8 floats it's 28 vs 22. > > For generic SVE we'd justify using VLA in that situation because the > gap is relatively small and SVE would (according to the cost model) > be a clear win beyond 256 bits. > > But in this case, the 256-bit VL estimate comes directly from a > -mcpu/-mtune option, so it is more definite than the usual estimates > for generic SVE. What happens at larger VL is pretty much irrevelant > in this case. > > > yet the SVE one was always picked. With guidance from Richard this seems to > > be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular > > the > > part with the big comment explaining the > > estimated_rel_new * 2 <= estimated_rel_old heuristic. > > > > This patch extends the comparisons by introducing a three-way estimate > > kind for poly_int values that the backend can distinguish. > > This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely > > estimates and pick Advanced SIMD overs SVE when it is clearly cheaper. > > > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Manually checked that with reasonable separate costs for Advanced SIMD and > > SVE > > GCC picks up one over the other in ways I'd expect. > > > > Ok for trunk? > > Thanks, > > Kyrill > > > > gcc/ > > * target.h (enum poly_value_estimate_kind): Define. > > (estimated_poly_value): Take an estimate kind argument. > > * target.def (estimated_poly_value): Update definition for the above. > > * doc/tm.texi: Regenerate. > > * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely > > estimates of VF to pick between vinfos. > > * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use > > estimated_poly_value instead of aarch64_estimated_poly_value. > > (aarch64_estimated_poly_value): Take a kind argument and handle it. > > OK, thanks. > > Richard
I checked in this patch to fix bootstrap on Linux/x86. -- H.J.
From 4a7a3110c70da8bad6978a36d9da3836538a0cc3 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Thu, 17 Dec 2020 11:00:02 -0800 Subject: [PATCH] Update default_estimated_poly_value prototype in targhooks.h commit 64432b680eab0bddbe9a4ad4798457cf6a14ad60 Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com> Date: Thu Dec 17 18:02:37 2020 +0000 vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p changed default_estimated_poly_value to HOST_WIDE_INT default_estimated_poly_value (poly_int64 x, poly_value_estimate_kind) { return x.coeffs[0]; } Update default_estimated_poly_value prototype in targhooks.h to match it. * targhooks.h (default_estimated_poly_value): Updated. --- gcc/targhooks.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 4542ba1b22d..4340a3b6222 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -221,7 +221,8 @@ extern int default_memory_move_cost (machine_mode, reg_class_t, bool); extern int default_register_move_cost (machine_mode, reg_class_t, reg_class_t); extern bool default_slow_unaligned_access (machine_mode, unsigned int); -extern HOST_WIDE_INT default_estimated_poly_value (poly_int64); +extern HOST_WIDE_INT default_estimated_poly_value (poly_int64, + poly_value_estimate_kind); extern bool default_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT, unsigned int, -- 2.29.2