On Tue, 2013-09-03 at 12:57 +0200, Richard Biener wrote: > On Fri, Jun 21, 2013 at 4:12 AM, Kugan > <kugan.vivekanandara...@linaro.org> wrote: > > On 17/06/13 19:07, Richard Biener wrote: > >> > >> On Mon, 17 Jun 2013, Kugan wrote: > >> > >>> Hi, > >>> > >>> I am attempting to fix Bug 43721 - Failure to optimise (a/b) and (a%b) > >>> into > >>> single __aeabi_idivmod call > >>> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721) > >>> > >>> execute_cse_sincos tree level pass does similar cse so I attempted to use > >>> similar approach here. Div/mod cse is not really using built-in functions > >>> though at this level. > >> > >> > >> The issue with performing the transform at the same time as we > >> transform SINCOS is that the vectorizer will now no longer be able > >> to vectorize these loops. It would need to be taught how to > >> handle the builtin calls (basically undo the transformation, I don't > >> know of any ISA that can do vectorized combined div/mod). Which > >> means it should rather be done at the point we CSE reciprocals > >> (which also replaces computes with builtin target function calls). > >> > > Thanks Richard. Since execute_cse_reciprocals is handling reciprocals only, > > I added another pass to do divmod. Is that OK? > > > > > >>> For the case of div and mod operations, after CSE is performed, there > >>> isnt a > >>> way to represent the resulting stament in gimple. We will endup with > >>> divmod > >>> taking two arguments and returning double the size of one arguments in > >>> the > >>> three address format (divmod will return reminder and quotient so the > >>> return > >>> type is double the size of argument type). > >>> > >>> Since GIMPLE_ASSIGN will result in type checking failure in this case, I > >>> atempted use built-in functions (GIMPLE_CALL to represent the runtime > >>> library > >>> call). Name for the function here is target specific and can be obtained > >>> from > >>> sdivmod_optab so the builtin function name defined in tree level is not > >>> used. > >>> I am not entirelt sure this is the right approach so I am attaching the > >>> first > >>> cut of the patch to get your feedback and understand the right approach > >>> to > >>> this problem. > >> > >> > >> If we don't want to expose new builtins to the user (I'm not sure we > >> want that), then using "internal functions" is an easier way to > >> avoid these issues (see gimple.h and internal-fn.(def|h)). > >> > > I have now changed to use internal functions. Thanks for that. > > > > > >> Generally the transform looks useful to me as it moves forward with > >> the general idea of moving pattern recognition done during RTL expansion > >> to an earlier place. > >> > >> For the use of a larger integer type and shifts to represent > >> the modulo and division result I don't think that's the very best > >> idea. Instead resorting to a complex integer type as return > >> value looks more appealing (similar to sincos using cexpi here). > >> That way you also avoid the ugly hard-coding of bit-sizes. > >> > > > > I have changed it to use complex integers now. > > > > > >> + if (HAVE_divsi3 > >> + || (GET_MODE_BITSIZE (TYPE_MODE (type)) != 32) > >> > >> watch out for types whose TYPE_PRECISION is not the bitsize > >> of their mode. Also it should be GET_MODE_PRECISION here. > >> > >> + || !optab_libfunc (TYPE_UNSIGNED (type)? udivmod_optab : > >> sdivmod_optab, > >> + TYPE_MODE (type))) > >> > >> targets that use a libfunc should also get this optimization, as > >> it always removes computations. I think the proper test is > >> for whether the target can do division and/or modulus without > >> using a libfunc, not whether there is a divmod optab/libfunc. > >> > > I guess best way to do is by defining a target hook and let the target > > define the required behaviour. Is that what you had in mind? > > > > I have attached a modified patch with these changes. > > +@deftypefn {Target Hook} bool TARGET_COMBINE_DIVMOD (enum > machine_mode @var{mode}) > > rather than talking about libcalls I'd say the target should indicate whether > it has a way to optimize division and modulo with the same operands for given > mode. As we have sdivmod_optab and udivmod_optab already you should > check whether for the given mode it is != CODE_for_nothing. Or even better, > follow some of the vectorizer expansion stuff and separate out a > predicate from expand_divmod () that tells you whether it would do something > interesting. > > NEXT_PASS (pass_cse_reciprocals); > + NEXT_PASS (pass_cse_divmod); > > of course I've meant that you perform divmod detection at the same > time reciprocals are optimized - thus in the very same pass. > > Sorry for taking so long to review this - the patch now needs updating > for the pass-manager C++-ification, too.
The NEXT_PASS addition now needs to happen to passes.def, rather than passes.c The opt_pass structs have become an inheritance hierarchy, with their const data moving to a pass_data struct. Hopefully the pattern is obvious from looking at other passes. FWIW, I wrote a script to make these latter changes, which may work for your new pass, by running refactor_passes.py from https://github.com/davidmalcolm/gcc-refactoring-scripts It expects a source tree in "../src" relative to the checkout of the refactoring scripts, but you can pass in a path to a specific files to have it run on them. Tested with Python 2.7; may run with 2.6. It edits files in-place, including the relevant ChangeLog file. I wouldn't run the script *until* you've merged/rebased the other changes from trunk back into your tree (assuming you're using git; from your patch it looks like you are). Hope this is helpful Dave