On Thu, Dec 17, 2020 at 6:16 AM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Kyrylo Tkachov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Hi all,
> >
> > While experimenting with some backend costs for Advanced SIMD and SVE I hit
> > many cases where GCC would pick SVE for VLA auto-vectorisation even when the
> > backend very clearly presented cheaper costs for Advanced SIMD.
> > For a simple float addition loop the SVE costs were:
> >
> > vec.c:9:21: note:  Cost model analysis:
> >   Vector inside of loop cost: 28
> >   Vector prologue cost: 2
> >   Vector epilogue cost: 0
> >   Scalar iteration cost: 10
> >   Scalar outside cost: 0
> >   Vector outside cost: 2
> >   prologue iterations: 0
> >   epilogue iterations: 0
> >   Minimum number of vector iterations: 1
> >   Calculated minimum iters for profitability: 4
> >
> > and for Advanced SIMD (Neon) they're:
> >
> > vec.c:9:21: note:  Cost model analysis:
> >   Vector inside of loop cost: 11
> >   Vector prologue cost: 0
> >   Vector epilogue cost: 0
> >   Scalar iteration cost: 10
> >   Scalar outside cost: 0
> >   Vector outside cost: 0
> >   prologue iterations: 0
> >   epilogue iterations: 0
> >   Calculated minimum iters for profitability: 0
> > vec.c:9:21: note:    Runtime profitability threshold = 4
>
> Just to expand on this for others on the list: this is comparing
> SVE with an estimated VL of 256 bits with Advanced SIMD at 128 bits,
> so for 8 floats it's 28 vs 22.
>
> For generic SVE we'd justify using VLA in that situation because the
> gap is relatively small and SVE would (according to the cost model)
> be a clear win beyond 256 bits.
>
> But in this case, the 256-bit VL estimate comes directly from a
> -mcpu/-mtune option, so it is more definite than the usual estimates
> for generic SVE.  What happens at larger VL is pretty much irrevelant
> in this case.
>
> > yet the SVE one was always picked. With guidance from Richard this seems to
> > be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular 
> > the
> > part with the big comment explaining the
> > estimated_rel_new * 2 <= estimated_rel_old heuristic.
> >
> > This patch extends the comparisons by introducing a three-way estimate
> > kind for poly_int values that the backend can distinguish.
> > This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely
> > estimates and pick Advanced SIMD overs SVE when it is clearly cheaper.
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu.
> > Manually checked that with reasonable separate costs for Advanced SIMD and 
> > SVE
> > GCC picks up one over the other in ways I'd expect.
> >
> > Ok for trunk?
> > Thanks,
> > Kyrill
> >
> > gcc/
> >       * target.h (enum poly_value_estimate_kind): Define.
> >       (estimated_poly_value): Take an estimate kind argument.
> >       * target.def (estimated_poly_value): Update definition for the above.
> >       * doc/tm.texi: Regenerate.
> >       * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely
> >       estimates of VF to pick between vinfos.
> >       * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use
> >       estimated_poly_value instead of aarch64_estimated_poly_value.
> >       (aarch64_estimated_poly_value): Take a kind argument and handle it.
>
> OK, thanks.
>
> Richard

I checked in this patch to fix bootstrap on Linux/x86.

-- 
H.J.
From 4a7a3110c70da8bad6978a36d9da3836538a0cc3 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.to...@gmail.com>
Date: Thu, 17 Dec 2020 11:00:02 -0800
Subject: [PATCH] Update default_estimated_poly_value prototype in targhooks.h

commit 64432b680eab0bddbe9a4ad4798457cf6a14ad60
Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
Date:   Thu Dec 17 18:02:37 2020 +0000

    vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p

changed default_estimated_poly_value to

HOST_WIDE_INT
default_estimated_poly_value (poly_int64 x, poly_value_estimate_kind)
{
  return x.coeffs[0];
}

Update default_estimated_poly_value prototype in targhooks.h to match it.

	* targhooks.h (default_estimated_poly_value): Updated.
---
 gcc/targhooks.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 4542ba1b22d..4340a3b6222 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -221,7 +221,8 @@ extern int default_memory_move_cost (machine_mode, reg_class_t, bool);
 extern int default_register_move_cost (machine_mode, reg_class_t,
 				       reg_class_t);
 extern bool default_slow_unaligned_access (machine_mode, unsigned int);
-extern HOST_WIDE_INT default_estimated_poly_value (poly_int64);
+extern HOST_WIDE_INT default_estimated_poly_value (poly_int64,
+						   poly_value_estimate_kind);
 
 extern bool default_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT,
 						    unsigned int,
-- 
2.29.2

Reply via email to