On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote: > On 11 November 2015 at 16:03, Richard Biener <rguent...@suse.de> wrote: > > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote: > > > >> On 10 November 2015 at 20:11, Richard Biener <rguent...@suse.de> wrote: > >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote: > >> > > >> >> On 4 November 2015 at 20:35, Richard Biener <rguent...@suse.de> wrote: > >> >> > > >> >> > Btw, did you investigate code gen differences on x86_64/i586? That > >> >> > target expands all divisions/modulo ops via divmod, relying on CSE > >> >> > solely as the HW always computes both div and mod (IIRC). > >> >> x86_64 has optab_handler for divmod defined, so the transform won't > >> >> take place on x86. > >> > > >> > Ok. > >> > > >> >> > + > >> >> > + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs > >> >> > (use_stmt), rhs); > >> >> > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); > >> >> > > >> >> > Ick. Please use > >> >> > > >> >> > gimple_set_rhs_from_tree (use_stmt, res); > >> >> Um there doesn't seem to be gimple_set_rhs_from_tree. > >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt. > >> >> Is that OK ? > >> > > >> > Yes. > >> > > >> >> > update_stmt (use_stmt); > >> >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) > >> >> > cfg_changed = true; > >> >> > > >> >> > + free_dominance_info (CDI_DOMINATORS); > >> >> > > >> >> > do not free dominators. > >> >> > >> >> I have done the suggested changes in the attached patch. > >> >> I have a few questions: > >> >> > >> >> a) Does the change to insert DIVMOD call before topmost div or mod > >> >> stmt with matching operands > >> >> look correct ? > >> > > >> > + /* Insert call-stmt just before the topmost div/mod stmt. > >> > + top_bb dominates all other basic blocks containing div/mod stms > >> > + so, the topmost stmt would be the first div/mod stmt with matching > >> > operands > >> > + in top_bb. */ > >> > + > >> > + gcc_assert (top_bb != 0); > >> > + gimple_stmt_iterator gsi; > >> > + for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next > >> > (&gsi)) > >> > + { > >> > + gimple *g = gsi_stmt (gsi); > >> > + if (is_gimple_assign (g) > >> > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR > >> > + || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR) > >> > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0) > >> > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0)) > >> > + break; > >> > > >> > Looks overly complicated to me. Just remember "topmost" use_stmt > >> > alongside top_bb (looks like you'll no longer need top_bb if you > >> > retail top_stmt). And then do > >> > > >> > gsi = gsi_for_stmt (top_stmt); > >> > > >> > and insert before that. > >> Thanks, done in this patch. Does it look OK ? > >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if > >> stmt1 occurs before stmt2 > >> only if stmt1 and stmt2 are in the same basic block ? > >> > > >> >> b) Handling constants - I dropped handling constants in the attached > >> >> patch. IIUC we don't want > >> >> to enable this transform if there's a specialized expansion for some > >> >> constants for div or mod ? > >> > > >> > See expand_divmod which has lots of special cases for constant operands > >> > not requiring target support for div or mod. > >> Thanks, would it be OK if I do this in follow up patch ? > > > > Well, just not handle them like in your patch is fine. > > > >> > > >> >> I suppose this would also be target dependent and require a target hook > >> >> ? > >> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand > >> >> is constant and <= 0 or power of 2, > >> >> while for other cases goes the expand_divmod() route to generate call > >> >> to __aeabi_idivmod libcall. > >> > > >> > Ok, so it lacks a signed mod instruction. > >> > > >> >> c) Gating the divmod transform - > >> >> I tried gating it on checks for optab_handlers on div and mod, however > >> >> this doesn't enable transform for arm cortex-a9 > >> >> anymore (cortex-a9 doesn't have hardware instructions for integer div > >> >> and mod). > >> >> IIUC for cortex-a9, > >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because > >> >> HAVE_divsi3 is 0. > >> >> However optab_handler (smod_optab, SImode) matches since optab_handler > >> >> only checks for existence of pattern > >> >> (and not whether the pattern gets matched). > >> >> I suppose we should enable the transform only if the divmod, div, and > >> >> mod pattern do not match rather than checking > >> >> if the patterns exist via optab_handler ? For a general x % y, modsi3 > >> >> would fail to match but optab_handler(smod_optab, SImode ) still > >> >> says it's matched. > >> > > >> > Ah, of course. Querying for an optab handler is just a cheap > >> > guesstimate... Not sure how to circumvent this best (sub-target > >> > enablement of patterns). RTL expansion just goes ahead (of course) > >> > and sees if expansion eventually fails. Richard? > >> > > >> >> Should we define a new target hook combine_divmod, which returns true > >> >> if transforming to divmod is desirable for that > >> >> target ? > >> >> The default definition could be: > >> >> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2) > >> >> { > >> >> // check for optab_handlers for div/mod/divmod and libfunc for divmod > >> >> } > >> >> > >> >> And for arm, it could be over-ridden to return false if op2 is > >> >> constant and <= 0 or power of 2. > >> >> I am not really sure if this is a good idea since I am replicating > >> >> information from modsi3 pattern. > >> >> Any change to the pattern may require corresponding change to the hook > >> >> :/ > >> > > >> > Yeah, I don't think that is desirable. Ideally we'd have a way > >> > to query HAVE_* for CODE_FOR_* which would mean target-insns.def > >> > support for all div/mod/divmod patterns(?) and queries... > >> > > >> > Not sure if what would be enough though. > >> > > >> > Note that the divmod check is equally flawed. > >> > > >> > I think with the above I'd enable the transform when > >> > > >> > + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing > >> > + || (optab_libfunc (divmod_optab, mode) != NULL_RTX > >> > && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing)) > >> > + return false; > >> Um this fails for the arm backend (for cortex-a9) because > >> optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false > >> optab_libfunc (divmod_optab, mode) != NULL_RTX is true. > >> optab_handler (div_optab, mode) == CODE_FOR_nothing is true. > >> which comes down to false || (true && true) which is true and we hit > >> return false. > > > > Oh, sorry to mess up the test - it was supposed to be inverted. > > > >> AFAIU, we want the transform to be disabled if: > >> a) optab_handler exists for divmod. > >> b) optab_handler exists for div. > >> c) optab_libfunc does not exist for divmod. */ > >> > >> + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing > >> + || optab_handler (div_optab, mode) != CODE_FOR_nothing > >> + || optab_libfunc (divmod_optab, mode) == NULL_RTX) > >> + return false; > >> Does that look correct ? > > > > No. That will disable if we have a divmod optab. Instead try > > > > if (! (optab_handler (divmod_optab, mode) != CODE_FOR_nothing > > || (optab_libfunc (divmod_optab, mode) != NULL_RTX > > && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))) > > return false; > > > > which is what I intended. If we have a divmod optab go ahead. > > If we have a libfunc and not a div optab then as well. > Oops, I assumed that we only wanted this transform if optab_libfunc existed. > Modified the test in the attached patch. > Well this does affect x86_64 and i?86 -;) > > I added the following hunk back to expand_DIVMOD, since if optab_handler > exists > we want to use it for expansion. Does it look OK ? > > + /* Check if optab handler exists for udivmod/sdivmod. */ > + if (optab_handler (tab, mode) != CODE_FOR_nothing) > + { > + rtx quotient = gen_reg_rtx (mode); > + rtx remainder = gen_reg_rtx (mode); > + expand_twoval_binop (tab, op0, op1, quotient, remainder, > TYPE_UNSIGNED (type)); > + > + /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */ > + expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs), > + make_tree (TREE_TYPE (arg0), quotient), > + make_tree (TREE_TYPE (arg1), remainder)), > + target, VOIDmode, EXPAND_NORMAL); > + > + return; > + }
Ah, sure. > I verified the code generated for x86_64 and i?86 and it's same for my > test-cases. > However during a clean build of gcc for x86_64, I am getting segfault > in bid64_div.c: > In file included from > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_internal.h:27:0, > from > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:56: > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c: > In function ‘__bid64_div’: > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_conf.h:36:19: > internal compiler error: in expand_DIVMOD, at internal-fn.c:2099 > #define bid64_div __bid64_div > ^ > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:80:1: > note: in expansion of macro ‘bid64_div’ > bid64_div (UINT64 x, > ^ > 0x8e101f expand_DIVMOD > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/internal-fn.c:2099 > 0x705927 expand_call_stmt > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:2549 > 0x705927 expand_gimple_stmt_1 > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3509 > 0x705927 expand_gimple_stmt > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3672 > 0x708d65 expand_gimple_basic_block > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:5676 > 0x70e696 execute > /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:6288 > > It looks like in the following code in expand_DIVMOD: > > + rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0); > + rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, > + GET_MODE_SIZE (mode)); > > remainder is (nil) and hence the segfault in make_tree (I added the > asserts for quotient and remainder later). > I am not sure why it's happening though, investigating it. No idea - as said, RTL expansion isn't my area of expertise. I would expect that offsetted subregs shouldn't be used for complex components. Maybe gen_lowpart / gen_highpart will work better which abstracts the subregging. Richard. > Thanks, > Prathamesh > > > >> > > >> > so we either will have a divmod instruction (hopefully not sub-target > >> > disabled for us) or a libfunc for divmod and for sure no HW divide > >> > instruction (HW mod can be emulated by HW divide but not the other > >> > way around). > >> > > >> >> d) Adding effective-target-check for divmod: I just enabled it for > >> >> arm*-*-* for now. I could additionally append more targets, > >> >> not sure if this is the right approach. > >> > > >> > Looks good to me. > >> Is this version OK if bootstrap/testing passes ? > > > > Ok with adjusting the optab check like above. > > > > Thanks, > > Richard.