On Wed, 27 Jan 2016, Prathamesh Kulkarni wrote: > On 11 November 2015 at 19:04, Richard Biener <rguent...@suse.de> wrote: > > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote: > > > >> On 11 November 2015 at 16:03, Richard Biener <rguent...@suse.de> wrote: > >> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote: > >> > > >> >> On 10 November 2015 at 20:11, Richard Biener <rguent...@suse.de> wrote: > >> >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote: > >> >> > > >> >> >> On 4 November 2015 at 20:35, Richard Biener <rguent...@suse.de> > >> >> >> wrote: > >> >> >> > > >> >> >> > Btw, did you investigate code gen differences on x86_64/i586? That > >> >> >> > target expands all divisions/modulo ops via divmod, relying on CSE > >> >> >> > solely as the HW always computes both div and mod (IIRC). > >> >> >> x86_64 has optab_handler for divmod defined, so the transform won't > >> >> >> take place on x86. > >> >> > > >> >> > Ok. > >> >> > > >> >> >> > + > >> >> >> > + gassign *assign_stmt = gimple_build_assign > >> >> >> > (gimple_assign_lhs > >> >> >> > (use_stmt), rhs); > >> >> >> > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); > >> >> >> > > >> >> >> > Ick. Please use > >> >> >> > > >> >> >> > gimple_set_rhs_from_tree (use_stmt, res); > >> >> >> Um there doesn't seem to be gimple_set_rhs_from_tree. > >> >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for > >> >> >> use_stmt. > >> >> >> Is that OK ? > >> >> > > >> >> > Yes. > >> >> > > >> >> >> > update_stmt (use_stmt); > >> >> >> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) > >> >> >> > cfg_changed = true; > >> >> >> > > >> >> >> > + free_dominance_info (CDI_DOMINATORS); > >> >> >> > > >> >> >> > do not free dominators. > >> >> >> > >> >> >> I have done the suggested changes in the attached patch. > >> >> >> I have a few questions: > >> >> >> > >> >> >> a) Does the change to insert DIVMOD call before topmost div or mod > >> >> >> stmt with matching operands > >> >> >> look correct ? > >> >> > > >> >> > + /* Insert call-stmt just before the topmost div/mod stmt. > >> >> > + top_bb dominates all other basic blocks containing div/mod stms > >> >> > + so, the topmost stmt would be the first div/mod stmt with > >> >> > matching > >> >> > operands > >> >> > + in top_bb. */ > >> >> > + > >> >> > + gcc_assert (top_bb != 0); > >> >> > + gimple_stmt_iterator gsi; > >> >> > + for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next > >> >> > (&gsi)) > >> >> > + { > >> >> > + gimple *g = gsi_stmt (gsi); > >> >> > + if (is_gimple_assign (g) > >> >> > + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR > >> >> > + || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR) > >> >> > + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0) > >> >> > + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0)) > >> >> > + break; > >> >> > > >> >> > Looks overly complicated to me. Just remember "topmost" use_stmt > >> >> > alongside top_bb (looks like you'll no longer need top_bb if you > >> >> > retail top_stmt). And then do > >> >> > > >> >> > gsi = gsi_for_stmt (top_stmt); > >> >> > > >> >> > and insert before that. > >> >> Thanks, done in this patch. Does it look OK ? > >> >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if > >> >> stmt1 occurs before stmt2 > >> >> only if stmt1 and stmt2 are in the same basic block ? > >> >> > > >> >> >> b) Handling constants - I dropped handling constants in the attached > >> >> >> patch. IIUC we don't want > >> >> >> to enable this transform if there's a specialized expansion for some > >> >> >> constants for div or mod ? > >> >> > > >> >> > See expand_divmod which has lots of special cases for constant > >> >> > operands > >> >> > not requiring target support for div or mod. > >> >> Thanks, would it be OK if I do this in follow up patch ? > >> > > >> > Well, just not handle them like in your patch is fine. > >> > > >> >> > > >> >> >> I suppose this would also be target dependent and require a target > >> >> >> hook ? > >> >> >> For instance arm defines modsi3 pattern to expand mod when 2nd > >> >> >> operand > >> >> >> is constant and <= 0 or power of 2, > >> >> >> while for other cases goes the expand_divmod() route to generate call > >> >> >> to __aeabi_idivmod libcall. > >> >> > > >> >> > Ok, so it lacks a signed mod instruction. > >> >> > > >> >> >> c) Gating the divmod transform - > >> >> >> I tried gating it on checks for optab_handlers on div and mod, > >> >> >> however > >> >> >> this doesn't enable transform for arm cortex-a9 > >> >> >> anymore (cortex-a9 doesn't have hardware instructions for integer > >> >> >> div and mod). > >> >> >> IIUC for cortex-a9, > >> >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing > >> >> >> because > >> >> >> HAVE_divsi3 is 0. > >> >> >> However optab_handler (smod_optab, SImode) matches since > >> >> >> optab_handler > >> >> >> only checks for existence of pattern > >> >> >> (and not whether the pattern gets matched). > >> >> >> I suppose we should enable the transform only if the divmod, div, and > >> >> >> mod pattern do not match rather than checking > >> >> >> if the patterns exist via optab_handler ? For a general x % y, modsi3 > >> >> >> would fail to match but optab_handler(smod_optab, SImode ) still > >> >> >> says it's matched. > >> >> > > >> >> > Ah, of course. Querying for an optab handler is just a cheap > >> >> > guesstimate... Not sure how to circumvent this best (sub-target > >> >> > enablement of patterns). RTL expansion just goes ahead (of course) > >> >> > and sees if expansion eventually fails. Richard? > >> >> > > >> >> >> Should we define a new target hook combine_divmod, which returns true > >> >> >> if transforming to divmod is desirable for that > >> >> >> target ? > >> >> >> The default definition could be: > >> >> >> bool default_combine_divmod (enum machine_mode mode, tree op1, tree > >> >> >> op2) > >> >> >> { > >> >> >> // check for optab_handlers for div/mod/divmod and libfunc for > >> >> >> divmod > >> >> >> } > >> >> >> > >> >> >> And for arm, it could be over-ridden to return false if op2 is > >> >> >> constant and <= 0 or power of 2. > >> >> >> I am not really sure if this is a good idea since I am replicating > >> >> >> information from modsi3 pattern. > >> >> >> Any change to the pattern may require corresponding change to the > >> >> >> hook :/ > >> >> > > >> >> > Yeah, I don't think that is desirable. Ideally we'd have a way > >> >> > to query HAVE_* for CODE_FOR_* which would mean target-insns.def > >> >> > support for all div/mod/divmod patterns(?) and queries... > >> >> > > >> >> > Not sure if what would be enough though. > >> >> > > >> >> > Note that the divmod check is equally flawed. > >> >> > > >> >> > I think with the above I'd enable the transform when > >> >> > > >> >> > + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing > >> >> > + || (optab_libfunc (divmod_optab, mode) != NULL_RTX > >> >> > && optab_handler ([su]div_optab, mode) == > >> >> > CODE_FOR_nothing)) > >> >> > + return false; > >> >> Um this fails for the arm backend (for cortex-a9) because > >> >> optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false > >> >> optab_libfunc (divmod_optab, mode) != NULL_RTX is true. > >> >> optab_handler (div_optab, mode) == CODE_FOR_nothing is true. > >> >> which comes down to false || (true && true) which is true and we hit > >> >> return false. > >> > > >> > Oh, sorry to mess up the test - it was supposed to be inverted. > >> > > >> >> AFAIU, we want the transform to be disabled if: > >> >> a) optab_handler exists for divmod. > >> >> b) optab_handler exists for div. > >> >> c) optab_libfunc does not exist for divmod. */ > >> >> > >> >> + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing > >> >> + || optab_handler (div_optab, mode) != CODE_FOR_nothing > >> >> + || optab_libfunc (divmod_optab, mode) == NULL_RTX) > >> >> + return false; > >> >> Does that look correct ? > >> > > >> > No. That will disable if we have a divmod optab. Instead try > >> > > >> > if (! (optab_handler (divmod_optab, mode) != CODE_FOR_nothing > >> > || (optab_libfunc (divmod_optab, mode) != NULL_RTX > >> > && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))) > >> > return false; > >> > > >> > which is what I intended. If we have a divmod optab go ahead. > >> > If we have a libfunc and not a div optab then as well. > >> Oops, I assumed that we only wanted this transform if optab_libfunc > >> existed. > >> Modified the test in the attached patch. > >> Well this does affect x86_64 and i?86 -;) > >> > >> I added the following hunk back to expand_DIVMOD, since if optab_handler > >> exists > >> we want to use it for expansion. Does it look OK ? > >> > >> + /* Check if optab handler exists for udivmod/sdivmod. */ > >> + if (optab_handler (tab, mode) != CODE_FOR_nothing) > >> + { > >> + rtx quotient = gen_reg_rtx (mode); > >> + rtx remainder = gen_reg_rtx (mode); > >> + expand_twoval_binop (tab, op0, op1, quotient, remainder, > >> TYPE_UNSIGNED (type)); > >> + > >> + /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR > >> */ > >> + expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs), > >> + make_tree (TREE_TYPE (arg0), quotient), > >> + make_tree (TREE_TYPE (arg1), remainder)), > >> + target, VOIDmode, EXPAND_NORMAL); > >> + > >> + return; > >> + } > > > > Ah, sure. > > > >> I verified the code generated for x86_64 and i?86 and it's same for my > >> test-cases. > >> However during a clean build of gcc for x86_64, I am getting segfault > >> in bid64_div.c: > >> In file included from > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_internal.h:27:0, > >> from > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:56: > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c: > >> In function ‘__bid64_div’: > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_conf.h:36:19: > >> internal compiler error: in expand_DIVMOD, at internal-fn.c:2099 > >> #define bid64_div __bid64_div > >> ^ > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:80:1: > >> note: in expansion of macro ‘bid64_div’ > >> bid64_div (UINT64 x, > >> ^ > >> 0x8e101f expand_DIVMOD > >> > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/internal-fn.c:2099 > >> 0x705927 expand_call_stmt > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:2549 > >> 0x705927 expand_gimple_stmt_1 > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3509 > >> 0x705927 expand_gimple_stmt > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3672 > >> 0x708d65 expand_gimple_basic_block > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:5676 > >> 0x70e696 execute > >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:6288 > >> > >> It looks like in the following code in expand_DIVMOD: > >> > >> + rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0); > >> + rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, > >> + GET_MODE_SIZE (mode)); > >> > >> remainder is (nil) and hence the segfault in make_tree (I added the > >> asserts for quotient and remainder later). > >> I am not sure why it's happening though, investigating it. > > > > No idea - as said, RTL expansion isn't my area of expertise. I would > > expect that offsetted subregs shouldn't be used for complex components. > > Maybe gen_lowpart / gen_highpart will work better which abstracts > > the subregging. > Hi Richard, > This is a revamped version of my previous patch for divmod transform. > The segfault with that patch can be reproduced with following > test-case on x86_64 with -m32: > typedef unsigned int DItype __attribute__((mode(DI))); > > DItype f (DItype x, DItype y) > { > DItype quot, rem; > > quot = x / y; > rem = x % y; > return quot + rem; > } > > Jim pointed out to me that happens because target-specific divmod > libfuncs have different calling conventions, there is no "standard" > calling convention for divmod in libgcc. > The divmod libfunc in this case is libgcc2.c:__udivmoddi4() which has > a different calling convention from arm's __aeabi_divmod() and I was > assuming > that all divmod's follow arm's calling convention. > The arm version expects that op0 and op1 are passed as arguments and > both div,rem are returned with return value having mode twice that of > it's arguments. > whereas libgcc2.c:__udivmoddi4() takes 3 args: op0, op1 of unsigned > DImode and 3rd arg is pointer used for storing remainder while > return value contains quotient. > Similarly spu's divmod libfuncs follow libgcc2.c:__udivmoddi4()'s convention. > > To workaround this, I defined a new hook expand_divmod_libfunc, which > targets must override for expanding call to target-specific dimovd. > The "default" hook default_expand_divmod_libfunc() expands call to > libgcc2.c:__udivmoddi4() since that's the only "generic" divmod > available. > Is this a reasonable approach ?
Hum. How do they get to expand/generate it today? That said, I'm no expert in this area. A simpler solution may be to not do the transform if there is no instruction for divmod. > Expansion proceeds as follows: > expand_DIVMOD checks if optab_handler for udivmod/sdivmod exists and > if it does, uses expand_twoval_binop() for expansion. > else it calls the target hook for generating call to divmod libfunc. > > The divmod transform takes place if: > a) optab_handler exists for divmod or > b) optab_libfunc (divmod) exists and optab_handler(div) doesn't. > and target overrides expand_divmod_libfunc or for the default function, > mode is DImode and TYPE_UNSIGNED (type) is true. > > Regarding test-cases, I added effective-check for divmod for only arm > However DIVMOD transforms will also take place for targets having > hardware div instructions. > Should I need to manually add each configuration to > check_effective_target_divmod() that have > hardware div or is there a better way to check that ? You have to manually add to that. > The patch passes bootstrap and testing on x86_64-unknown-linux-gnu > and cross-tested on arm-linux-gnueabihf, arm-none-eabi, > armeb-none-linux-gnueabihf, It's now too late for this, please queue it for GCC 7. Thanks, Richard.