Re: [PR43721] Failure to optimise (a/b) and (a%b) into single call

David Malcolm Tue, 03 Sep 2013 09:14:37 -0700

On Tue, 2013-09-03 at 12:57 +0200, Richard Biener wrote:
> On Fri, Jun 21, 2013 at 4:12 AM, Kugan
> <kugan.vivekanandara...@linaro.org> wrote:
> > On 17/06/13 19:07, Richard Biener wrote:
> >>
> >> On Mon, 17 Jun 2013, Kugan wrote:
> >>
> >>> Hi,
> >>>
> >>> I am attempting to fix Bug 43721 - Failure to optimise (a/b) and (a%b)
> >>> into
> >>> single __aeabi_idivmod call
> >>> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721)
> >>>
> >>> execute_cse_sincos tree level pass does similar cse so I attempted to use
> >>> similar approach here. Div/mod cse is not really using built-in functions
> >>> though at this level.
> >>
> >>
> >> The issue with performing the transform at the same time as we
> >> transform SINCOS is that the vectorizer will now no longer be able
> >> to vectorize these loops.  It would need to be taught how to
> >> handle the builtin calls (basically undo the transformation, I don't
> >> know of any ISA that can do vectorized combined div/mod).  Which
> >> means it should rather be done at the point we CSE reciprocals
> >> (which also replaces computes with builtin target function calls).
> >>
> > Thanks Richard. Since execute_cse_reciprocals is handling reciprocals only,
> > I added another pass to do divmod. Is that OK?
> >
> >
> >>> For the case of div and mod operations, after CSE is performed, there
> >>> isnt a
> >>> way to represent the resulting stament in gimple. We will endup with
> >>> divmod
> >>> taking two arguments and returning double the size of one arguments in
> >>> the
> >>> three address format (divmod will return reminder and quotient so the
> >>> return
> >>> type is double the size of argument type).
> >>>
> >>> Since GIMPLE_ASSIGN will result in type checking failure in this case, I
> >>> atempted use built-in functions (GIMPLE_CALL to represent the runtime
> >>> library
> >>> call). Name for the function here  is target specific and can be obtained
> >>> from
> >>> sdivmod_optab so the builtin function name defined in tree level is not
> >>> used.
> >>> I am not entirelt sure this is the right approach so I am attaching the
> >>> first
> >>> cut of the patch to get your feedback and understand the right approach
> >>> to
> >>> this problem.
> >>
> >>
> >> If we don't want to expose new builtins to the user (I'm not sure we
> >> want that), then using "internal functions" is an easier way to
> >> avoid these issues (see gimple.h and internal-fn.(def|h)).
> >>
> > I have now changed to use internal functions. Thanks for that.
> >
> >
> >> Generally the transform looks useful to me as it moves forward with
> >> the general idea of moving pattern recognition done during RTL expansion
> >> to an earlier place.
> >>
> >> For the use of a larger integer type and shifts to represent
> >> the modulo and division result I don't think that's the very best
> >> idea.  Instead resorting to a complex integer type as return
> >> value looks more appealing (similar to sincos using cexpi here).
> >> That way you also avoid the ugly hard-coding of bit-sizes.
> >>
> >
> > I have changed it to use complex integers now.
> >
> >
> >> +  if (HAVE_divsi3
> >> +       || (GET_MODE_BITSIZE (TYPE_MODE (type)) != 32)
> >>
> >> watch out for types whose TYPE_PRECISION is not the bitsize
> >> of their mode.  Also it should be GET_MODE_PRECISION here.
> >>
> >> +       || !optab_libfunc (TYPE_UNSIGNED (type)? udivmod_optab :
> >> sdivmod_optab,
> >> +                            TYPE_MODE (type)))
> >>
> >> targets that use a libfunc should also get this optimization, as
> >> it always removes computations.  I think the proper test is
> >> for whether the target can do division and/or modulus without
> >> using a libfunc, not whether there is a divmod optab/libfunc.
> >>
> > I guess best way to do is by defining a target hook and let the target
> > define the required behaviour. Is that what you had in mind?
> >
> > I have attached a modified patch with these changes.
> 
> +@deftypefn {Target Hook} bool TARGET_COMBINE_DIVMOD (enum
> machine_mode @var{mode})
> 
> rather than talking about libcalls I'd say the target should indicate whether
> it has a way to optimize division and modulo with the same operands for given
> mode.  As we have sdivmod_optab and udivmod_optab already you should
> check whether for the given mode it is != CODE_for_nothing.  Or even better,
> follow some of the vectorizer expansion stuff and separate out a
> predicate from expand_divmod () that tells you whether it would do something
> interesting.
> 
>        NEXT_PASS (pass_cse_reciprocals);
> +      NEXT_PASS (pass_cse_divmod);
> 
> of course I've meant that you perform divmod detection at the same
> time reciprocals are optimized - thus in the very same pass.
> 
> Sorry for taking so long to review this - the patch now needs updating
> for the pass-manager C++-ification, too.


The NEXT_PASS addition now needs to happen to passes.def, rather than
passes.c

The opt_pass structs have become an inheritance hierarchy, with their
const data moving to a pass_data struct.  Hopefully the pattern is
obvious from looking at other passes.

FWIW, I wrote a script to make these latter changes, which may work for
your new pass, by running refactor_passes.py from
https://github.com/davidmalcolm/gcc-refactoring-scripts
It expects a source tree in "../src" relative to the checkout of the
refactoring scripts, but you can pass in a path to a specific files to
have it run on them.   Tested with Python 2.7; may run with 2.6.  It
edits files in-place, including the relevant ChangeLog file.

I wouldn't run the script *until* you've merged/rebased the other
changes from trunk back into your tree (assuming you're using git; from
your patch it looks like you are).

Hope this is helpful
Dave

Re: [PR43721] Failure to optimise (a/b) and (a%b) into single call

Reply via email to