On Mon, 19 May 2025, Tamar Christina wrote: > > > +-param=vect-scalar-cost-multiplier= > > > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(1) > > IntegerRange(0, 100000) Param Optimization > > > +The scaling multiplier to add to all scalar loop costing when performing > > vectorization profitability analysis. The default value is 1. > > > + > > > > Note this only allows whole number scaling. May I suggest to instead > > use percentage as unit, thus the multiplier is --param > > param_vect_scalar_cost_multiplier / 100? > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master?
OK. > Thanks, > Tamar > > gcc/ChangeLog: > > * params.opt (vect-scalar-cost-multiplier): New. > * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it. > * doc/invoke.texi (vect-scalar-cost-multiplier): Document it. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/cost_model_16.c: New test. > > -- inline copy of patch -- > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index > 699ee1cc0b7580d4729bbefff8f897eed1c3e49b..95a25c0f63b77f26db05a7b48bfad8f9c58bcc5f > 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -17273,6 +17273,10 @@ this parameter. The default value of this parameter > is 50. > @item vect-induction-float > Enable loop vectorization of floating point inductions. > > +@item vect-scalar-cost-multiplier > +Apply the given multiplier % to scalar loop costing during vectorization. > +Increasing the cost multiplier will make vector loops more profitable. > + > @item vrp-block-limit > Maximum number of basic blocks before VRP switches to a lower memory > algorithm. > > diff --git a/gcc/params.opt b/gcc/params.opt > index > 1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593fe17cd88f2fc32367 > 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies > to the cost of statements i > Common Joined UInteger Var(param_vect_induction_float) Init(1) > IntegerRange(0, 1) Param Optimization > Enable loop vectorization of floating point inductions. > > +-param=vect-scalar-cost-multiplier= > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(100) > IntegerRange(0, 10000) Param Optimization > +The scaling multiplier as a percentage to apply to all scalar loop costing > when performing vectorization profitability analysis. The default value is > 100. > + > -param=vrp-block-limit= > Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization > Param > Maximum number of basic blocks before VRP switches to a fast model with less > memory requirements. > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c > b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..c405591a101d50b4734bc6d65a6d6c01888bea48 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization > -fdump-tree-vect-details" } */ > + > +void > +foo (char *restrict a, int *restrict b, int *restrict c, > + int *restrict d, int stride) > +{ > + if (stride <= 1) > + return; > + > + for (int i = 0; i < 3; i++) > + { > + int res = c[i]; > + int t = b[i * stride]; > + if (a[i] != 0) > + res = t * d[i]; > + c[i] = res; > + } > +} > + > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index > fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..c18e75794046f506c473b36639e6ae6658a5516b > 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -4646,7 +4646,8 @@ vect_estimate_min_profitable_iters (loop_vec_info > loop_vinfo, > TODO: Consider assigning different costs to different scalar > statements. */ > > - scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost (); > + scalar_single_iter_cost = (loop_vinfo->scalar_costs->total_cost () > + * param_vect_scalar_cost_multiplier) / 100; > > /* Add additional cost for the peeled instructions in prologue and epilogue > loop. (For fully-masked loops there will be no peeling.) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)