On Fri, Jun 21, 2013 at 4:12 AM, Kugan <kugan.vivekanandara...@linaro.org> wrote: > On 17/06/13 19:07, Richard Biener wrote: >> >> On Mon, 17 Jun 2013, Kugan wrote: >> >>> Hi, >>> >>> I am attempting to fix Bug 43721 - Failure to optimise (a/b) and (a%b) >>> into >>> single __aeabi_idivmod call >>> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721) >>> >>> execute_cse_sincos tree level pass does similar cse so I attempted to use >>> similar approach here. Div/mod cse is not really using built-in functions >>> though at this level. >> >> >> The issue with performing the transform at the same time as we >> transform SINCOS is that the vectorizer will now no longer be able >> to vectorize these loops. It would need to be taught how to >> handle the builtin calls (basically undo the transformation, I don't >> know of any ISA that can do vectorized combined div/mod). Which >> means it should rather be done at the point we CSE reciprocals >> (which also replaces computes with builtin target function calls). >> > Thanks Richard. Since execute_cse_reciprocals is handling reciprocals only, > I added another pass to do divmod. Is that OK? > > >>> For the case of div and mod operations, after CSE is performed, there >>> isnt a >>> way to represent the resulting stament in gimple. We will endup with >>> divmod >>> taking two arguments and returning double the size of one arguments in >>> the >>> three address format (divmod will return reminder and quotient so the >>> return >>> type is double the size of argument type). >>> >>> Since GIMPLE_ASSIGN will result in type checking failure in this case, I >>> atempted use built-in functions (GIMPLE_CALL to represent the runtime >>> library >>> call). Name for the function here is target specific and can be obtained >>> from >>> sdivmod_optab so the builtin function name defined in tree level is not >>> used. >>> I am not entirelt sure this is the right approach so I am attaching the >>> first >>> cut of the patch to get your feedback and understand the right approach >>> to >>> this problem. >> >> >> If we don't want to expose new builtins to the user (I'm not sure we >> want that), then using "internal functions" is an easier way to >> avoid these issues (see gimple.h and internal-fn.(def|h)). >> > I have now changed to use internal functions. Thanks for that. > > >> Generally the transform looks useful to me as it moves forward with >> the general idea of moving pattern recognition done during RTL expansion >> to an earlier place. >> >> For the use of a larger integer type and shifts to represent >> the modulo and division result I don't think that's the very best >> idea. Instead resorting to a complex integer type as return >> value looks more appealing (similar to sincos using cexpi here). >> That way you also avoid the ugly hard-coding of bit-sizes. >> > > I have changed it to use complex integers now. > > >> + if (HAVE_divsi3 >> + || (GET_MODE_BITSIZE (TYPE_MODE (type)) != 32) >> >> watch out for types whose TYPE_PRECISION is not the bitsize >> of their mode. Also it should be GET_MODE_PRECISION here. >> >> + || !optab_libfunc (TYPE_UNSIGNED (type)? udivmod_optab : >> sdivmod_optab, >> + TYPE_MODE (type))) >> >> targets that use a libfunc should also get this optimization, as >> it always removes computations. I think the proper test is >> for whether the target can do division and/or modulus without >> using a libfunc, not whether there is a divmod optab/libfunc. >> > I guess best way to do is by defining a target hook and let the target > define the required behaviour. Is that what you had in mind? > > I have attached a modified patch with these changes.
+@deftypefn {Target Hook} bool TARGET_COMBINE_DIVMOD (enum machine_mode @var{mode}) rather than talking about libcalls I'd say the target should indicate whether it has a way to optimize division and modulo with the same operands for given mode. As we have sdivmod_optab and udivmod_optab already you should check whether for the given mode it is != CODE_for_nothing. Or even better, follow some of the vectorizer expansion stuff and separate out a predicate from expand_divmod () that tells you whether it would do something interesting. NEXT_PASS (pass_cse_reciprocals); + NEXT_PASS (pass_cse_divmod); of course I've meant that you perform divmod detection at the same time reciprocals are optimized - thus in the very same pass. Sorry for taking so long to review this - the patch now needs updating for the pass-manager C++-ification, too. Thanks, Richard. > >> Others knowing this piece of the compiler better may want to comment >> here, of course. >> >> Thanks, >> Richard. >> > > > > Thanks, > Kugan