"Kewen.Lin" <li...@linux.ibm.com> writes: > Hi Richard, > > on 2022/11/24 17:24, Richard Sandiford wrote: >> "Kewen.Lin" <li...@linux.ibm.com> writes: >>> Hi, >>> >>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, >>> STORE} into normal vector load/store if the given length is known >>> to be equal to the length of the whole vector. It would help to >>> improve overall cycles as normally the latency of vector access >>> with length in bytes is bigger than normal vector access, and it >>> also saves the preparation for length if constant length can not >>> be encoded into instruction (such as on power). >>> >>> Bootstrapped and regtested on x86_64-redhat-linux, >>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >>> >>> Is it ok for trunk? >>> >>> BR, >>> Kewen >>> ----- >>> PR tree-optimization/107412 >>> >>> gcc/ChangeLog: >>> >>> * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ... >>> (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter >>> mask_p indicating it's for mask or length, and add some handlings for >>> IFN LEN_{LOAD,STORE}. >>> (gimple_fold_mask_load): Rename to ... >>> (gimple_fold_partial_load): ... this, add one parameter mask_p. >>> (gimple_fold_mask_store): Rename to ... >>> (gimple_fold_partial_store): ... this, add one parameter mask_p. >>> (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE}, >>> and adjust calls on gimple_fold_mask_load_store_mem_ref to >>> gimple_fold_partial_load_store_mem_ref. >> >> Sorry to reply to late (still catching up on email), but: >> >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr107412.c: New test. >>> * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for >>> folded LEN_LOAD. >>> --- >>> gcc/gimple-fold.cc | 57 ++++++++++++++----- >>> .../gcc.target/powerpc/p9-vec-length-epil-8.c | 2 +- >>> gcc/testsuite/gcc.target/powerpc/pr107412.c | 19 +++++++ >>> 3 files changed, 64 insertions(+), 14 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c >>> >>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc >>> index a1704784bc9..e3a087defa6 100644 >>> --- a/gcc/gimple-fold.cc >>> +++ b/gcc/gimple-fold.cc >>> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree >>> type, >>> return wi::min_precision (wres, sign) > TYPE_PRECISION (type); >>> } >>> >>> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF >>> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a >>> MEM_REF >>> for the memory it references, otherwise return null. VECTYPE is the >>> - type of the memory vector. */ >>> + type of the memory vector. MASK_P indicates it's for MASK if true, >>> + otherwise it's for LEN. */ >>> >>> static tree >>> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype) >>> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool >>> mask_p) >>> { >>> tree ptr = gimple_call_arg (call, 0); >>> tree alias_align = gimple_call_arg (call, 1); >>> - tree mask = gimple_call_arg (call, 2); >>> - if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask)) >>> + if (!tree_fits_uhwi_p (alias_align)) >>> return NULL_TREE; >>> >>> + if (mask_p) >>> + { >>> + tree mask = gimple_call_arg (call, 2); >>> + if (!integer_all_onesp (mask)) >>> + return NULL_TREE; >>> + } else { >> >> Minor nit: }, else, and { should be on separate lines. But the thing >> I actually wanted to say was... > > Thanks for catching, I must have forgotten to reformat these lines. > >> >>> + tree basic_len = gimple_call_arg (call, 2); >>> + if (!tree_fits_uhwi_p (basic_len)) >>> + return NULL_TREE; >>> + unsigned int nargs = gimple_call_num_args (call); >>> + tree bias = gimple_call_arg (call, nargs - 1); >>> + gcc_assert (tree_fits_uhwi_p (bias)); >>> + tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); >>> + unsigned int len = tree_to_uhwi (biased_len); >>> + unsigned int vect_len >>> + = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); >>> + if (vect_len != len) >>> + return NULL_TREE; >> >> Using "unsigned int" truncates the value. I realise that's probably >> safe in this context, since large values have undefined behaviour. >> But it still seems better to use an untruncated type, so that it >> looks less like an oversight. (I think this is one case where "auto" >> can help, since it gets the type right automatically.) >> >> It would also be better to avoid the to_constant, since we haven't >> proven is_constant. How about: >> >> tree basic_len = gimple_call_arg (call, 2); >> if (!poly_int_tree_p (basic_len)) >> return NULL_TREE; >> unsigned int nargs = gimple_call_num_args (call); >> tree bias = gimple_call_arg (call, nargs - 1); >> gcc_assert (TREE_CODE (bias) == INTEGER_CST); >> if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias), >> GET_MODE_SIZE (TYPE_MODE (vectype)))) >> return NULL_TREE; >> >> which also avoids using tree arithmetic for the subtraction? > > I agree your proposed code has better robustness, thanks! > > Sorry that the original patch was committed, I made a patch as attached. > It's bootstrapped and regresss-tested on powerpc64-linux-gnu P8, and > powerpc64le-linux-gnu P9 and P10. > > Is it ok for trunk?
OK, thanks. Richard > BR, > Kewen > > From 3984a7f86a35d13e1fd40bc0c12ed5ad5b234047 Mon Sep 17 00:00:00 2001 > From: Kewen Lin <li...@linux.ibm.com> > Date: Sun, 27 Nov 2022 20:29:57 -0600 > Subject: [PATCH] gimple-fold: Refine gimple_fold_partial_load_store_mem_ref > > Following Richard's review comments, this patch is to use > untruncated type for the length used for IFN_LEN_{LOAD,STORE} > instead of "unsigned int" for better robustness. It also > avoid to use to_constant and tree arithmetic for subtraction. > > Co-authored-by: Richard Sandiford <richard.sandif...@arm.com> > > gcc/ChangeLog: > > * gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Use > untruncated type for the length, and avoid to_constant and tree > arithmetic for subtraction. > --- > gcc/gimple-fold.cc | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index c2d9c806aee..88d14c7adcc 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -5387,18 +5387,17 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, > tree vectype, bool mask_p) > tree mask = gimple_call_arg (call, 2); > if (!integer_all_onesp (mask)) > return NULL_TREE; > - } else { > + } > + else > + { > tree basic_len = gimple_call_arg (call, 2); > - if (!tree_fits_uhwi_p (basic_len)) > + if (!poly_int_tree_p (basic_len)) > return NULL_TREE; > unsigned int nargs = gimple_call_num_args (call); > tree bias = gimple_call_arg (call, nargs - 1); > - gcc_assert (tree_fits_shwi_p (bias)); > - tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); > - unsigned int len = tree_to_uhwi (biased_len); > - unsigned int vect_len > - = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); > - if (vect_len != len) > + gcc_assert (TREE_CODE (bias) == INTEGER_CST); > + if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias), > + GET_MODE_SIZE (TYPE_MODE (vectype)))) > return NULL_TREE; > }