On Fri, 9 Jul 2021, Jiufu Guo wrote: > Currently, doloop.xx variable is using the type as niter which may shorter > than word size. For some cases, it may be better to use word size type. > For example, on some 64bit system, to access 32bit niter, subreg maybe used. > Then using 64bit type would not need to use subreg if the value can be > present in both 32bit and 64bit. > > This patch updates doloop iv to BIT_PER_WORD size if it is fine. > > Bootstrap and regtest pass on powerpc64le and x86, is this ok for trunk? > > BR. > Jiufu > > gcc/ChangeLog: > > 2021-07-08 Jiufu Guo <guoji...@linux.ibm.com> > > PR target/61837 > * tree-ssa-loop-ivopts.c (add_iv_candidate_for_doloop): > Update iv on BITS_PER_WORD for niter. > > gcc/testsuite/ChangeLog: > > 2021-07-08 Jiufu Guo <guoji...@linux.ibm.com> > > PR target/61837 > * gcc.target/powerpc/pr61837.c: New test. > > --- > gcc/testsuite/gcc.target/powerpc/pr61837.c | 16 ++++++++++++++++ > gcc/tree-ssa-loop-ivopts.c | 10 ++++++++++ > 2 files changed, 26 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr61837.c > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr61837.c > b/gcc/testsuite/gcc.target/powerpc/pr61837.c > new file mode 100644 > index 00000000000..dc44eb9cb41 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr61837.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +void foo(int *p1, long *p2, int s) > +{ > + int n, v, i; > + > + v = 0; > + for (n = 0; n <= 100; n++) { > + for (i = 0; i < s; i++) > + if (p2[i] == n) > + p1[i] = v; > + v += 88; > + } > +} > + > +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */ > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c > index 12a8a49a307..c3c2f97918d 100644 > --- a/gcc/tree-ssa-loop-ivopts.c > +++ b/gcc/tree-ssa-loop-ivopts.c > @@ -5690,6 +5690,16 @@ add_iv_candidate_for_doloop (struct ivopts_data *data) > > tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter), > build_int_cst (ntype, 1)); > + > + /* Use type in word size may fast. */ > + if (TYPE_PRECISION (ntype) < BITS_PER_WORD > + && TYPE_PRECISION (long_unsigned_type_node) == BITS_PER_WORD > + && wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype))))
I wonder if there's a way to query the target what modes the doloop pattern can handle (not being too familiar with the doloop code). Why do you need to do any checks besides the new type being able to represent all IV values? The original doloop IV will never wrap (OTOH if niter is U*_MAX then we compute niter + 1 which will become zero ... I suppose the doloop might still do the correct thing here but it also still will with a IV with larger type). I'd have expected sth like ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED (ntype)); thus the decision made using a mode - which is also why I wonder if there's a way to query the target for this. As you say, it _may_ be fast, so better check (somehow). > + { > + ntype = long_unsigned_type_node; > + base = fold_convert (ntype, base); > + } > + > add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, NULL, > true); > } > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)