On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote: > On 4 November 2015 at 20:35, Richard Biener <rguent...@suse.de> wrote: > > > > Btw, did you investigate code gen differences on x86_64/i586? That > > target expands all divisions/modulo ops via divmod, relying on CSE > > solely as the HW always computes both div and mod (IIRC). > x86_64 has optab_handler for divmod defined, so the transform won't > take place on x86.
Ok. > > + > > + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs > > (use_stmt), rhs); > > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); > > > > Ick. Please use > > > > gimple_set_rhs_from_tree (use_stmt, res); > Um there doesn't seem to be gimple_set_rhs_from_tree. > I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt. > Is that OK ? Yes. > > update_stmt (use_stmt); > > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) > > cfg_changed = true; > > > > + free_dominance_info (CDI_DOMINATORS); > > > > do not free dominators. > > I have done the suggested changes in the attached patch. > I have a few questions: > > a) Does the change to insert DIVMOD call before topmost div or mod > stmt with matching operands > look correct ? + /* Insert call-stmt just before the topmost div/mod stmt. + top_bb dominates all other basic blocks containing div/mod stms + so, the topmost stmt would be the first div/mod stmt with matching operands + in top_bb. */ + + gcc_assert (top_bb != 0); + gimple_stmt_iterator gsi; + for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *g = gsi_stmt (gsi); + if (is_gimple_assign (g) + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR + || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR) + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0) + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0)) + break; Looks overly complicated to me. Just remember "topmost" use_stmt alongside top_bb (looks like you'll no longer need top_bb if you retail top_stmt). And then do gsi = gsi_for_stmt (top_stmt); and insert before that. > b) Handling constants - I dropped handling constants in the attached > patch. IIUC we don't want > to enable this transform if there's a specialized expansion for some > constants for div or mod ? See expand_divmod which has lots of special cases for constant operands not requiring target support for div or mod. > I suppose this would also be target dependent and require a target hook ? > For instance arm defines modsi3 pattern to expand mod when 2nd operand > is constant and <= 0 or power of 2, > while for other cases goes the expand_divmod() route to generate call > to __aeabi_idivmod libcall. Ok, so it lacks a signed mod instruction. > c) Gating the divmod transform - > I tried gating it on checks for optab_handlers on div and mod, however > this doesn't enable transform for arm cortex-a9 > anymore (cortex-a9 doesn't have hardware instructions for integer div and > mod). > IIUC for cortex-a9, > optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because > HAVE_divsi3 is 0. > However optab_handler (smod_optab, SImode) matches since optab_handler > only checks for existence of pattern > (and not whether the pattern gets matched). > I suppose we should enable the transform only if the divmod, div, and > mod pattern do not match rather than checking > if the patterns exist via optab_handler ? For a general x % y, modsi3 > would fail to match but optab_handler(smod_optab, SImode ) still > says it's matched. Ah, of course. Querying for an optab handler is just a cheap guesstimate... Not sure how to circumvent this best (sub-target enablement of patterns). RTL expansion just goes ahead (of course) and sees if expansion eventually fails. Richard? > Should we define a new target hook combine_divmod, which returns true > if transforming to divmod is desirable for that > target ? > The default definition could be: > bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2) > { > // check for optab_handlers for div/mod/divmod and libfunc for divmod > } > > And for arm, it could be over-ridden to return false if op2 is > constant and <= 0 or power of 2. > I am not really sure if this is a good idea since I am replicating > information from modsi3 pattern. > Any change to the pattern may require corresponding change to the hook :/ Yeah, I don't think that is desirable. Ideally we'd have a way to query HAVE_* for CODE_FOR_* which would mean target-insns.def support for all div/mod/divmod patterns(?) and queries... Not sure if what would be enough though. Note that the divmod check is equally flawed. I think with the above I'd enable the transform when + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing + || (optab_libfunc (divmod_optab, mode) != NULL_RTX && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing)) + return false; so we either will have a divmod instruction (hopefully not sub-target disabled for us) or a libfunc for divmod and for sure no HW divide instruction (HW mod can be emulated by HW divide but not the other way around). > d) Adding effective-target-check for divmod: I just enabled it for > arm*-*-* for now. I could additionally append more targets, > not sure if this is the right approach. Looks good to me. Thanks, Richard.