Hi all,

While experimenting with some backend costs for Advanced SIMD and SVE I hit
many cases where GCC would pick SVE for VLA auto-vectorisation even when the
backend very clearly presented cheaper costs for Advanced SIMD.
For a simple float addition loop the SVE costs were:

vec.c:9:21: note:  Cost model analysis:
  Vector inside of loop cost: 28
  Vector prologue cost: 2
  Vector epilogue cost: 0
  Scalar iteration cost: 10
  Scalar outside cost: 0
  Vector outside cost: 2
  prologue iterations: 0
  epilogue iterations: 0
  Minimum number of vector iterations: 1
  Calculated minimum iters for profitability: 4
  
and for Advanced SIMD (Neon) they're:

vec.c:9:21: note:  Cost model analysis:
  Vector inside of loop cost: 11
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar iteration cost: 10
  Scalar outside cost: 0
  Vector outside cost: 0
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 0
vec.c:9:21: note:    Runtime profitability threshold = 4

yet the SVE one was always picked. With guidance from Richard this seems to
be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular the
part with the big comment explaining the
estimated_rel_new * 2 <= estimated_rel_old heuristic.

This patch extends the comparisons by introducing a three-way estimate
kind for poly_int values that the backend can distinguish.
This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely
estimates and pick Advanced SIMD overs SVE when it is clearly cheaper.

Bootstrapped and tested on aarch64-none-linux-gnu.
Manually checked that with reasonable separate costs for Advanced SIMD and SVE
GCC picks up one over the other in ways I'd expect.

Ok for trunk?
Thanks,
Kyrill

gcc/
        * target.h (enum poly_value_estimate_kind): Define.
        (estimated_poly_value): Take an estimate kind argument.
        * target.def (estimated_poly_value): Update definition for the above.
        * doc/tm.texi: Regenerate.
        * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely
        estimates of VF to pick between vinfos.
        * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use
        estimated_poly_value instead of aarch64_estimated_poly_value.
        (aarch64_estimated_poly_value): Take a kind argument and handle it.

Attachment: vect-estimate.patch
Description: vect-estimate.patch

Reply via email to