On 30 October 2015 at 15:57, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >> Hi, >> I have attached revamped version of Kugan's patch >> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721. >> Test-case: http://pastebin.com/QMfpXLD9 >> divmod pass dump: http://pastebin.com/yMY1ikCp >> Assembly: http://pastebin.com/kk2HZpvA >> >> The approach I took is similar to sincos pass, which involves two parts: >> a) divmod pass that transforms: >> t1 = a / b; >> t2 = a % b; >> to the following sequence: >> complex_tmp = DIVMOD (a, b); >> t1 = REALPART_EXPR(a); >> t2 = IMAGPART_EXPR(b); >> >> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by >> expand_DIVMOD(). >> I am not sure if I have done this correctly. I was somehow looking to >> reuse expand_divmod() but am not able to figure out how to do that >> (AFAIU expand_divmod() strictly returns either the quotient or >> remainder but never both which is what I want), so ended up with >> manually expanding to rtx. >> >> While going through the sincos pass in execute_cse_sincos_1, I didn't >> understand the following: >> if (gimple_purge_dead_eh_edges (gimple_bb (stmt))) >> cfg_changed = true; >> Um why is the call to gimple_purge_dead_eh_edges necessary here? > > The call replacement might no longer throw. > >> A silly question, what options to pass to gcc to print statistics ? I >> added statistics_counter_event to keep track of number of calls >> inserted/used but not sure how to print them :/ > > -fdump-tree-XXX-stats or -fdump-statistics-stats Thanks, I can see it now -;) > >> I would be grateful for suggestions for improving the patch. > > First of all new functions need comments (expand_DIVMOD). > > Second - what's the reasoning for the pass placement seen? I think > the transform is a lowering one, improving RTL expansion. The > DIVMOD representation is harmful for GIMPLE optimizers. > > Third - I'd have integrated this with an existing pass - we have another > lowering / RTL expansion kind pass which is pass_optimize_widening_mul. > Please piggy-back on it. > > You seem to do the transform unconditionally even if the target has > instructions for division and modulus but no instruction for divmod > in which case you'll end up emitting a library call in the end instead > of a div and mod instruction. I think the transform should be gated on > missing div/mod instructions or the availability of a divmod one. > > + if (is_gimple_assign (stmt) > + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == > tcc_binary) > + { > + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR) > + cfg_changed |= execute_divmod_1 (stmt); > > you can directly check gimple_assign_rhs_code. I'd check for TRUNC_MOD_EXPR > which seems to be less common and thus should cut down compile-time. > > + /* Case 2: When both are in top_bb */ > + else if (op1_def_in_bb && op2_def_in_bb) > + { > + /* FIXME: better way to compare iterators?. */ > + gimple_stmt_iterator gsi = get_later_stmt (top_bb, > def_stmt_op1, def_stmt_op2); > > You can't use get_later_stmt, it's a vectorizer speciality. To do > this test efficiently > you need stmt UIDs computed. > > I miss a testcase (or two). I have tried to address your comments in the attached patch. Could you please review it for me ?
I have a few questions: a) Is the check for availability of divmod correct ? b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1. I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == bb1). However I wonder if we should also check that if bb0 dominates bb1 then bb1 should post-dominate bb0 ? For instance the transform gets applied to test-case in pr43721-2.c. bb containing rem = x % y doesn't post-dominate bb containing quot = x / y; I wonder if we want to perform the transform for this case since control flow may not always reach from quot = x / y to rem = x % y ? c) A silly queston, I wonder if vec<gimple *> stmts in convert_to_divmod() will have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in the vector) ? I assume that cse will take place before widening_mul pass executes (since pass_fre/pass_fre executes earlier), and there will be no duplicate gimple stmts ? So vec<gimple *> stmts would contain at most 2 entries - gimple stmt with subcode TRUNC_MOD_EXPR and gimple stmt with subcode TRUNC_DIV_EXPR having same operands. There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with same operands ? d) I am not sure if I have correctly done expansion of divmod to rtx in expand_DIVMOD() (I fear I may be missing many checks that are in expmed.c:expand_divmod). Thanks, Prathamesh > > Richard. > > >> Thank you, >> Prathamesh
2015-11-02 Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> * internal-fn.def: Add entry for expand_DIVMOD. * internal-fn.c (expand_DIVMOD): New function. * tree-ssa-math-opts.c: (maybe_record_divmod): Likewise. * tree-ssa-math-opts.c: (convert_to_divmod): Likewise. * tree-ssa-math-opts.c: Include optabs-libfuncs.h. * tree-ssa-math-opts.c: (widen_mul_stats): New member divmod_calls_inserted. * tree-ssa-math-opts.c: (pass_optimize_widening_mul::execute): Call caluclate_dominace_info, compute stmt uids and add check for TRUNC_MOD_EXPR. testsuite/ * gcc.dg/pr43721-1.c: New test. * gcc.dg/pr43721-2.c: Likewise. * gcc.dg/pr43721-3.c: Likewise. * gcc.dg/pr43721-4.c: Likewise.
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index a7da373..1fda966 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2045,6 +2045,75 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED) gcc_unreachable (); } +/* Expand DIVMOD() using: + a) optab handler for udivmod/sdivmod if it is available + b) If optab_handler doesn't exist, Generate call to optab_libfunc for udivmod/sdivmod. */ + +static void +expand_DIVMOD (gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg0 = gimple_call_arg (stmt, 0); + tree arg1 = gimple_call_arg (stmt, 1); + + gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); + tree type = TREE_TYPE (TREE_TYPE (lhs)); + machine_mode mode = TYPE_MODE (type); + optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab; + + rtx op0 = expand_normal (arg0); + rtx op1 = expand_normal (arg1); + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + + /* Check if optab handler exists for udivmod/sdivmod. */ + if (optab_handler (tab, mode) != CODE_FOR_nothing) + { + rtx quotient = gen_reg_rtx (mode); + rtx remainder = gen_reg_rtx (mode); + expand_twoval_binop (tab, op0, op1, quotient, remainder, TYPE_UNSIGNED (type)); + + /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */ + expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs), + make_tree (TREE_TYPE (arg0), quotient), + make_tree (TREE_TYPE (arg1), remainder)), + target, VOIDmode, EXPAND_NORMAL); + + return; + } + + rtx libfunc = NULL_RTX; + machine_mode compute_mode; + for (compute_mode = mode; + compute_mode != VOIDmode; + compute_mode = GET_MODE_WIDER_MODE (compute_mode)) + { + libfunc = optab_libfunc (tab, compute_mode); + if (libfunc != NULL_RTX) + break; + } + + gcc_assert (libfunc != NULL_RTX); + + if (compute_mode != mode) + { + op0 = convert_modes (compute_mode, GET_MODE (op0), op0, tab); + op1 = convert_modes (compute_mode, GET_MODE (op1), op1, tab); + } + + machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE (compute_mode), MODE_INT); + rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, + libval_mode, 2, op0, compute_mode, op1, compute_mode); + + rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0); + rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, GET_MODE_SIZE (compute_mode)); + + /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */ + expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs), + make_tree (TREE_TYPE (arg0), quotient), + make_tree (TREE_TYPE (arg1), remainder)), + target, VOIDmode, EXPAND_NORMAL); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 78266d9..f28579b 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".") /* OpenACC looping abstraction. See internal-fn.h for usage. */ DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL) + +DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL) diff --git a/gcc/testsuite/gcc.dg/pr43721-1.c b/gcc/testsuite/gcc.dg/pr43721-1.c new file mode 100644 index 0000000..8873d9f --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-1.c @@ -0,0 +1,10 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +int f(int x, int y) +{ + int quotient = x / y; + int remainder = x % y; + return quotient + remainder; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-2.c b/gcc/testsuite/gcc.dg/pr43721-2.c new file mode 100644 index 0000000..62d73af --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-2.c @@ -0,0 +1,16 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +int f(int x, int y) +{ + extern int early_exit; + + int quot = x / y; + + if (early_exit) + return 0; + + int rem = x % y; + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-3.c b/gcc/testsuite/gcc.dg/pr43721-3.c new file mode 100644 index 0000000..74816a0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-3.c @@ -0,0 +1,17 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +int f(int x, int y) +{ + extern int flag; + int quot; + + if (flag) + quot = x / y; + else + quot = 0; + + int rem = x % y; + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-4.c b/gcc/testsuite/gcc.dg/pr43721-4.c new file mode 100644 index 0000000..fd82ad8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-4.c @@ -0,0 +1,18 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +int f(int x, int y) +{ + int quot = 0; + int rem = 0; + + extern int flag; + + if (flag) + quot = x / y; + else + rem = x % y; + + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */ diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 41fcabf..080858a 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -110,6 +110,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "builtins.h" #include "params.h" +#include "optabs-libfuncs.h" /* This structure represents one basic block that either computes a division, or is a common dominator for basic block that compute a @@ -182,6 +183,9 @@ static struct /* Number of fp fused multiply-add ops inserted. */ int fmas_inserted; + + /* Number of divmod calls inserted. */ + int divmod_calls_inserted; } widen_mul_stats; /* The instance of "struct occurrence" representing the highest @@ -3493,6 +3497,166 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2) return true; } +/* Add use_stmt to stmts if either top_bb or gimple_bb(use_stmt) dominate each other. + If gimple_bb (use_stmt) dominates top_bb, then set top_bb to gimple_bb (use_stmt). */ + +static bool +maybe_record_divmod (vec<gimple *>& stmts, basic_block& top_bb, gimple *use_stmt) +{ + basic_block bb = gimple_bb (use_stmt); + + if (dominated_by_p (CDI_DOMINATORS, top_bb, bb)) + ; + else if (dominated_by_p (CDI_DOMINATORS, bb, top_bb)) + top_bb = bb; + else + return false; + + stmts.safe_push (use_stmt); + return true; +} + +/* This function looks for: + t1 = a TRUNC_DIV_EXPR b; + t2 = a TRUNC_MOD_EXPR b; + and transforms it to the following sequence: + complex_tmp = DIVMOD (a, b); + t1 = REALPART_EXPR(a); + t2 = IMAGPART_EXPR(b); + This change is done only if the target has support for divmod. + + The pass works in two phases: + 1) Walk through all immediate uses of stmt's operand and find a TRUNC_DIV_EXPR with matching operands + and if such a stmt is found add it to stmts vector. + 2) Insert DIVMOD() internal function in the basic block that dominates all statements in stmts + vector (top_bb) and upates statements in stmts vector to use return value of DIVMOD. */ + +static bool +convert_to_divmod (gassign *stmt) +{ + /* Check if divmod is available for the target. */ + enum machine_mode mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))); + const_tree type = TREE_TYPE (gimple_assign_lhs (stmt)); + optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab; + + if (!(optab_handler (tab, mode) || optab_libfunc (tab, mode))) + return false; + + vec<gimple *> stmts = vNULL; + stmts.safe_push (stmt); + basic_block top_bb = gimple_bb (stmt); + + tree op1 = gimple_assign_rhs1 (stmt); + tree op2 = gimple_assign_rhs2 (stmt); + + use_operand_p use_p; + imm_use_iterator use_iter; + + FOR_EACH_IMM_USE_FAST (use_p, use_iter, op1) + { + gimple *use_stmt = USE_STMT (use_p); + if (is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR) + { + tree u_op1 = gimple_assign_rhs1 (use_stmt); + tree u_op2 = gimple_assign_rhs2 (use_stmt); + + if ((operand_equal_p (op1, u_op1, 0) && operand_equal_p (op2, u_op2, 0)) + || (operand_equal_p (op1, u_op2, 0) && operand_equal_p (op2, u_op1, 0))) + maybe_record_divmod (stmts, top_bb, use_stmt); + } + } + + if (stmts.length () == 1) + return false; + + /* Create the library call. */ + gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2); + tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (gimple_assign_lhs (stmt))), call_stmt, "divmod_tmp"); + gimple_call_set_lhs (call_stmt, res); + + widen_mul_stats.divmod_calls_inserted++; + + /* Insert the call-stmt. */ + gimple *def_stmt_op1 = SSA_NAME_DEF_STMT (op1); + bool op1_def_in_bb = + (!SSA_NAME_IS_DEFAULT_DEF (op1) + && gimple_code (def_stmt_op1) != GIMPLE_PHI + && gimple_bb (def_stmt_op1) == top_bb); + + gimple *def_stmt_op2 = SSA_NAME_DEF_STMT (op2); + bool op2_def_in_bb = + (!SSA_NAME_IS_DEFAULT_DEF (op2) + && gimple_code (def_stmt_op2) != GIMPLE_PHI + && gimple_bb (def_stmt_op2) == top_bb); + + /* 3 cases arise where to insert the call to divmod depending upon where op1 and op2 are defined wrt top_bb: + * Case 1: When both def are outside top_bb, simply insert after labels */ + if (!op1_def_in_bb && !op2_def_in_bb) + { + gcc_assert (top_bb != 0); + gimple_stmt_iterator gsi = gsi_after_labels (top_bb); + gsi_insert_before (&gsi, call_stmt, GSI_SAME_STMT); + } + + /* Case 2: When both def are in top_bb. */ + else if (op1_def_in_bb && op2_def_in_bb) + { + gimple_stmt_iterator gsi; + + if (gimple_uid (def_stmt_op1) < gimple_uid (def_stmt_op2)) + gsi = gsi_for_stmt (def_stmt_op2); + else + gsi = gsi_for_stmt (def_stmt_op1); + + gsi_insert_after (&gsi, call_stmt, GSI_SAME_STMT); + } + + /* Case 3: When one def is inside top_bb and one is outside. */ + else + { + gimple_stmt_iterator gsi; + if (op1_def_in_bb) + gsi = gsi_for_stmt (def_stmt_op1); + else + gsi = gsi_for_stmt (def_stmt_op2); + + gsi_insert_after (&gsi, call_stmt, GSI_SAME_STMT); + } + + /* Update stmts. */ + bool cfg_changed = false; + gimple *use_stmt; + for (unsigned i = 0; i < stmts.length (); ++i) + { + tree rhs; + use_stmt = stmts[i]; + + switch (gimple_assign_rhs_code (use_stmt)) + { + case TRUNC_DIV_EXPR: + rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res); + break; + + case TRUNC_MOD_EXPR: + rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res); + break; + + default: + gcc_unreachable (); + } + + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), rhs); + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); + gsi_replace (&gsi, assign_stmt, true); + if (gimple_purge_dead_eh_edges (gimple_bb (assign_stmt))) + cfg_changed = true; + } + + stmts.release (); + return cfg_changed; +} + /* Find integer multiplications where the operands are extended from smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR where appropriate. */ @@ -3535,8 +3699,21 @@ pass_optimize_widening_mul::execute (function *fun) basic_block bb; bool cfg_changed = false; + calculate_dominance_info (CDI_DOMINATORS); + memset (&widen_mul_stats, 0, sizeof (widen_mul_stats)); + /* Compute stmt uids. */ + FOR_EACH_BB_FN (bb, fun) + { + unsigned uid_no = 0; + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid_no++); + } + } + FOR_EACH_BB_FN (bb, fun) { gimple_stmt_iterator gsi; @@ -3563,6 +3740,10 @@ pass_optimize_widening_mul::execute (function *fun) } break; + case TRUNC_MOD_EXPR: + convert_to_divmod (as_a<gassign *> (stmt)); + break; + case PLUS_EXPR: case MINUS_EXPR: convert_plusminus_to_widen (&gsi, stmt, code); @@ -3614,6 +3795,10 @@ pass_optimize_widening_mul::execute (function *fun) widen_mul_stats.maccs_inserted); statistics_counter_event (fun, "fused multiply-adds inserted", widen_mul_stats.fmas_inserted); + statistics_counter_event (fun, "divmod calls inserted", + widen_mul_stats.divmod_calls_inserted); + + free_dominance_info (CDI_DOMINATORS); return cfg_changed ? TODO_cleanup_cfg : 0; }