On 10 November 2015 at 20:11, Richard Biener <rguent...@suse.de> wrote: > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote: > >> On 4 November 2015 at 20:35, Richard Biener <rguent...@suse.de> wrote: >> > >> > Btw, did you investigate code gen differences on x86_64/i586? That >> > target expands all divisions/modulo ops via divmod, relying on CSE >> > solely as the HW always computes both div and mod (IIRC). >> x86_64 has optab_handler for divmod defined, so the transform won't >> take place on x86. > > Ok. > >> > + >> > + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs >> > (use_stmt), rhs); >> > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); >> > >> > Ick. Please use >> > >> > gimple_set_rhs_from_tree (use_stmt, res); >> Um there doesn't seem to be gimple_set_rhs_from_tree. >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt. >> Is that OK ? > > Yes. > >> > update_stmt (use_stmt); >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) >> > cfg_changed = true; >> > >> > + free_dominance_info (CDI_DOMINATORS); >> > >> > do not free dominators. >> >> I have done the suggested changes in the attached patch. >> I have a few questions: >> >> a) Does the change to insert DIVMOD call before topmost div or mod >> stmt with matching operands >> look correct ? > > + /* Insert call-stmt just before the topmost div/mod stmt. > + top_bb dominates all other basic blocks containing div/mod stms > + so, the topmost stmt would be the first div/mod stmt with matching > operands > + in top_bb. */ > + > + gcc_assert (top_bb != 0); > + gimple_stmt_iterator gsi; > + for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next > (&gsi)) > + { > + gimple *g = gsi_stmt (gsi); > + if (is_gimple_assign (g) > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR > + || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR) > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0) > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0)) > + break; > > Looks overly complicated to me. Just remember "topmost" use_stmt > alongside top_bb (looks like you'll no longer need top_bb if you > retail top_stmt). And then do > > gsi = gsi_for_stmt (top_stmt); > > and insert before that. Thanks, done in this patch. Does it look OK ? IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if stmt1 occurs before stmt2 only if stmt1 and stmt2 are in the same basic block ? > >> b) Handling constants - I dropped handling constants in the attached >> patch. IIUC we don't want >> to enable this transform if there's a specialized expansion for some >> constants for div or mod ? > > See expand_divmod which has lots of special cases for constant operands > not requiring target support for div or mod. Thanks, would it be OK if I do this in follow up patch ? > >> I suppose this would also be target dependent and require a target hook ? >> For instance arm defines modsi3 pattern to expand mod when 2nd operand >> is constant and <= 0 or power of 2, >> while for other cases goes the expand_divmod() route to generate call >> to __aeabi_idivmod libcall. > > Ok, so it lacks a signed mod instruction. > >> c) Gating the divmod transform - >> I tried gating it on checks for optab_handlers on div and mod, however >> this doesn't enable transform for arm cortex-a9 >> anymore (cortex-a9 doesn't have hardware instructions for integer div and >> mod). >> IIUC for cortex-a9, >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because >> HAVE_divsi3 is 0. >> However optab_handler (smod_optab, SImode) matches since optab_handler >> only checks for existence of pattern >> (and not whether the pattern gets matched). >> I suppose we should enable the transform only if the divmod, div, and >> mod pattern do not match rather than checking >> if the patterns exist via optab_handler ? For a general x % y, modsi3 >> would fail to match but optab_handler(smod_optab, SImode ) still >> says it's matched. > > Ah, of course. Querying for an optab handler is just a cheap > guesstimate... Not sure how to circumvent this best (sub-target > enablement of patterns). RTL expansion just goes ahead (of course) > and sees if expansion eventually fails. Richard? > >> Should we define a new target hook combine_divmod, which returns true >> if transforming to divmod is desirable for that >> target ? >> The default definition could be: >> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2) >> { >> // check for optab_handlers for div/mod/divmod and libfunc for divmod >> } >> >> And for arm, it could be over-ridden to return false if op2 is >> constant and <= 0 or power of 2. >> I am not really sure if this is a good idea since I am replicating >> information from modsi3 pattern. >> Any change to the pattern may require corresponding change to the hook :/ > > Yeah, I don't think that is desirable. Ideally we'd have a way > to query HAVE_* for CODE_FOR_* which would mean target-insns.def > support for all div/mod/divmod patterns(?) and queries... > > Not sure if what would be enough though. > > Note that the divmod check is equally flawed. > > I think with the above I'd enable the transform when > > + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing > + || (optab_libfunc (divmod_optab, mode) != NULL_RTX > && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing)) > + return false; Um this fails for the arm backend (for cortex-a9) because optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false optab_libfunc (divmod_optab, mode) != NULL_RTX is true. optab_handler (div_optab, mode) == CODE_FOR_nothing is true. which comes down to false || (true && true) which is true and we hit return false. AFAIU, we want the transform to be disabled if: a) optab_handler exists for divmod. b) optab_handler exists for div. c) optab_libfunc does not exist for divmod. */
+ if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing + || optab_handler (div_optab, mode) != CODE_FOR_nothing + || optab_libfunc (divmod_optab, mode) == NULL_RTX) + return false; Does that look correct ? > > so we either will have a divmod instruction (hopefully not sub-target > disabled for us) or a libfunc for divmod and for sure no HW divide > instruction (HW mod can be emulated by HW divide but not the other > way around). > >> d) Adding effective-target-check for divmod: I just enabled it for >> arm*-*-* for now. I could additionally append more targets, >> not sure if this is the right approach. > > Looks good to me. Is this version OK if bootstrap/testing passes ? Thanks, Prathamesh > > Thanks, > Richard.
2015-11-11 Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> * internal-fn.def: Add entry for expand_DIVMOD. * internal-fn.c (expand_DIVMOD): New function. * tree-ssa-math-opts.c: Include optabs-libfuncs.h. (maybe_record_divmod): New function. (divmod_candidate_p): Likewise. (convert_to_divmod): Likewise. (widen_mul_stats): New member divmod_calls_inserted. (pass_optimize_widening_mul::execute): Call caluclate_dominace_info, call renumber_gimple_stmt_uids, and add check for TRUNC_MOD_EXPR. testsuite/ * gcc.dg/pr43721-1.c: New test. * gcc.dg/pr43721-2.c: Likewise. * gcc.dg/pr43721-3.c: Likewise. * gcc.dg/pr43721-4.c: Likewise. * lib/target-supports.exp: Add check for divmod.
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index a7da373..c0c84d8 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2045,6 +2045,47 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED) gcc_unreachable (); } +/* Expand DIVMOD() using: + a) optab handler for udivmod/sdivmod if it is available. + b) If optab_handler doesn't exist, Generate call to + optab_libfunc for udivmod/sdivmod. */ + +static void +expand_DIVMOD (gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg0 = gimple_call_arg (stmt, 0); + tree arg1 = gimple_call_arg (stmt, 1); + + gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); + tree type = TREE_TYPE (TREE_TYPE (lhs)); + machine_mode mode = TYPE_MODE (type); + optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab; + + rtx op0 = expand_normal (arg0); + rtx op1 = expand_normal (arg1); + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + + rtx libfunc = optab_libfunc (tab, mode); + gcc_assert (libfunc != NULL_RTX); + + machine_mode libval_mode = + smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode), MODE_INT); + + rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, + libval_mode, 2, op0, mode, op1, mode); + + rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0); + rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, + GET_MODE_SIZE (mode)); + + /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */ + expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs), + make_tree (TREE_TYPE (arg0), quotient), + make_tree (TREE_TYPE (arg1), remainder)), + target, VOIDmode, EXPAND_NORMAL); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 78266d9..f28579b 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".") /* OpenACC looping abstraction. See internal-fn.h for usage. */ DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL) + +DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL) diff --git a/gcc/testsuite/gcc.dg/pr43721-1.c b/gcc/testsuite/gcc.dg/pr43721-1.c new file mode 100644 index 0000000..2ec1118 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-1.c @@ -0,0 +1,11 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + int quotient = x / y; + int remainder = x % y; + return quotient + remainder; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-2.c b/gcc/testsuite/gcc.dg/pr43721-2.c new file mode 100644 index 0000000..974099c --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-2.c @@ -0,0 +1,17 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + extern int early_exit; + + int quot = x / y; + + if (early_exit) + return 0; + + int rem = x % y; + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-3.c b/gcc/testsuite/gcc.dg/pr43721-3.c new file mode 100644 index 0000000..41cda2e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-3.c @@ -0,0 +1,18 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + extern int flag; + int quot; + + if (flag) + quot = x / y; + else + quot = 0; + + int rem = x % y; + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-4.c b/gcc/testsuite/gcc.dg/pr43721-4.c new file mode 100644 index 0000000..b4f69b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-4.c @@ -0,0 +1,19 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + int quot = 0; + int rem = 0; + + extern int flag; + + if (flag) + quot = x / y; + else + rem = x % y; + + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index b543519..c3895a7 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -6494,3 +6494,12 @@ proc check_effective_target_vect_max_reduc { } { } return 0 } + +# Return 1 if the target supports divmod + +proc check_effective_target_divmod { } { + if { [istarget arm*-*-*] } { + return 1 + } + return 0 +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 41fcabf..bd6dc9c 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -110,6 +110,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "builtins.h" #include "params.h" +#include "optabs-libfuncs.h" +#include "tree-eh.h" /* This structure represents one basic block that either computes a division, or is a common dominator for basic block that compute a @@ -182,6 +184,9 @@ static struct /* Number of fp fused multiply-add ops inserted. */ int fmas_inserted; + + /* Number of divmod calls inserted. */ + int divmod_calls_inserted; } widen_mul_stats; /* The instance of "struct occurrence" representing the highest @@ -3493,6 +3498,162 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2) return true; } +/* Set top_stmt to topmost stmt between top_stmt and use_stmt, and add + use_stmt to vector use_stmts provided basic block containing top_stmt + dominates use_stmt or vice versa. */ + +static void +maybe_record_divmod (vec<gimple *>& stmts, gimple *&top_stmt, gimple *use_stmt) +{ + basic_block bb = gimple_bb (use_stmt); + basic_block top_bb = gimple_bb (top_stmt); + + if (dominated_by_p (CDI_DOMINATORS, top_bb, bb)) + { + if (bb != top_bb + || gimple_uid (use_stmt) < gimple_uid (top_stmt)) + top_stmt = use_stmt; + } + else if (!dominated_by_p (CDI_DOMINATORS, bb, top_bb)) + return; + + stmts.safe_push (use_stmt); +} + +/* Check if the stmt is a candidate for divmod transform. */ + +static bool +divmod_candidate_p (gassign *stmt) +{ + enum machine_mode mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))); + const_tree type = TREE_TYPE (gimple_assign_lhs (stmt)); + optab divmod_optab, div_optab; + + if (TYPE_UNSIGNED (type)) + { + divmod_optab = udivmod_optab; + div_optab = udiv_optab; + } + else + { + divmod_optab = sdivmod_optab; + div_optab = sdiv_optab; + } + + /* Disable the transform if: + a) optab_handler exists for divmod. + b) optab_handler exists for div. + c) optab_libfunc does not exist for divmod. */ + + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing + || optab_handler (div_optab, mode) != CODE_FOR_nothing + || optab_libfunc (divmod_optab, mode) == NULL_RTX) + return false; + + tree op1 = gimple_assign_rhs1 (stmt); + tree op2 = gimple_assign_rhs2 (stmt); + + /* FIXME: Drop handling constants for now. */ + if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2)) + return false; + + if (TYPE_OVERFLOW_TRAPS (type)) + return false; + + return true; +} + +/* This function looks for: + t1 = a TRUNC_DIV_EXPR b; + t2 = a TRUNC_MOD_EXPR b; + and transforms it to the following sequence: + complex_tmp = DIVMOD (a, b); + t1 = REALPART_EXPR(a); + t2 = IMAGPART_EXPR(b); + This change is done only if the target has support for divmod. + + The pass works in two phases: + 1) Walk through all immediate uses of stmt's operand and find a + TRUNC_DIV_EXPR with matching operands and if such a stmt is found add + it to stmts vector. + 2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that + dominates other div/mod stmts with same operands) and update entries in + stmts vector to use return value of DIMOVD (REALEXPR_PART for div, + IMAGPART_EXPR for mod). */ + +static bool +convert_to_divmod (gassign *stmt) +{ + if (!divmod_candidate_p (stmt)) + return false; + + tree op1 = gimple_assign_rhs1 (stmt); + tree op2 = gimple_assign_rhs2 (stmt); + + vec<gimple *> stmts = vNULL; + stmts.safe_push (stmt); + + imm_use_iterator use_iter; + gimple *use_stmt; + gimple *top_stmt = stmt; + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1) + { + if (is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR + && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0) + && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0)) + maybe_record_divmod (stmts, top_stmt, use_stmt); + } + + if (stmts.length () == 1) + return false; + + /* Create the library call and insert the call stmt before top_stmt. */ + gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2); + tree res = make_temp_ssa_name ( + build_complex_type (TREE_TYPE (gimple_assign_lhs (stmt))), + call_stmt, "divmod_tmp"); + + gimple_call_set_lhs (call_stmt, res); + gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt); + gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT); + + widen_mul_stats.divmod_calls_inserted++; + + /* Update stmts. */ + bool cfg_changed = false; + for (unsigned i = 0; i < stmts.length (); ++i) + { + tree rhs; + use_stmt = stmts[i]; + + switch (gimple_assign_rhs_code (use_stmt)) + { + case TRUNC_DIV_EXPR: + rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res); + break; + + case TRUNC_MOD_EXPR: + rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res); + break; + + default: + gcc_unreachable (); + } + + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); + gimple_assign_set_rhs_from_tree (&gsi, rhs); + update_stmt (use_stmt); + + if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) + cfg_changed = true; + } + + stmts.release (); + return cfg_changed; +} + /* Find integer multiplications where the operands are extended from smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR where appropriate. */ @@ -3535,7 +3696,10 @@ pass_optimize_widening_mul::execute (function *fun) basic_block bb; bool cfg_changed = false; + calculate_dominance_info (CDI_DOMINATORS); + memset (&widen_mul_stats, 0, sizeof (widen_mul_stats)); + renumber_gimple_stmt_uids (); FOR_EACH_BB_FN (bb, fun) { @@ -3563,6 +3727,10 @@ pass_optimize_widening_mul::execute (function *fun) } break; + case TRUNC_MOD_EXPR: + convert_to_divmod (as_a<gassign *> (stmt)); + break; + case PLUS_EXPR: case MINUS_EXPR: convert_plusminus_to_widen (&gsi, stmt, code); @@ -3614,6 +3782,8 @@ pass_optimize_widening_mul::execute (function *fun) widen_mul_stats.maccs_inserted); statistics_counter_event (fun, "fused multiply-adds inserted", widen_mul_stats.fmas_inserted); + statistics_counter_event (fun, "divmod calls inserted", + widen_mul_stats.divmod_calls_inserted); return cfg_changed ? TODO_cleanup_cfg : 0; }