Richard Biener <richard.guent...@gmail.com> writes: > On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz <m...@suse.de> wrote: >> Hi, >> >> On Mon, 9 Nov 2015, Richard Sandiford wrote: >> >>> +static bool >>> +can_use_internal_fn (gcall *call) >>> +{ >>> + /* Only replace calls that set errno. */ >>> + if (!gimple_vdef (call)) >>> + return false; >> >> Oh, I managed to confuse this in my head while reading the patch. So, >> hmm, you don't actually replace the builtin with an internal function >> (without the condition) under no-errno-math? Does something else do that? >> Because otherwise that seems an unnecessary restriction? >> >>> >> r229916 fixed that for the non-EH case. >>> > >>> > Ah, missed it. Even the EH case shouldn't be difficult. If the >>> > original dominator of the EH destination was the call block it moves, >>> > otherwise it remains unchanged. >>> >>> The target of the edge is easy in itself, I agree, but that isn't >>> necessarily the only affected block, if the EH handler doesn't >>> exit or rethrow. >> >> You're worried the non-EH and the EH regions merge again, right? Like so: >> >> before change: >> >> BB1: throwing-call >> fallthru/ \EH >> BB2 BBeh >> | /....\ (stuff in EH-region) >> | / some path out of EH region >> | /------------------/ >> BB3 >> >> Here, BB3 must at least be dominated by BB1 (the throwing block), or by >> something further up (when there are other side-entries to the path >> BB2->BB3 or into the EH region). When further up, nothing changes, when >> it's BB1, then it's afterwards dominated by the BB containing the >> condition. So everything with idom==BB1 gets idom=Bcond, except for BBeh, >> which gets idom=Bcall. Depending on how you split BB1, either Bcond or >> BBcall might still be BB1 and doesn't lead to changes in the dom tree. >> >>> > Currently we have quite some of such passes (reassoc, forwprop, >>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap >>> > and others), but they are all handling only special situations in one >>> > way or the other. pass_fold_builtins is another one, but it seems >>> > most related to what you want (replacing a call with something else), >>> > so I thought that'd be the natural choice. >>> >>> Well, to be pedantic, it's not really replacing the call. Except for >>> the special case of targets that support direct assignments to errno, >>> it keeps the original call but ensures that it isn't usually executed. >>> From that point of view it doesn't really seem like a fold. >>> >>> But I suppose that's just naming again :-). And it's easily solved with >>> s/fold/rewrite/. >> >> Exactly, in my mind pass_fold_builtin (like many of the others I >> mentioned) doesn't do folding but rewriting :) > > So I am replying here to the issue of where to do the transform call_cdce > does and the one Richard wants to add. For example we "lower" > posix_memalign as early as GIMPLE lowering (that's before CFG construction). > We also lower sincos to cexpi during GENERIC folding (or if that is dropped > either GIMPLE lowering or GIMPLE folding during gimplification would be > appropriate). > > Now, with offloading we have to avoid creating target dependencies before > LTO stream-out (thus no IFN replacements before that - not sure if > Richards patches have an issue there already).
No, this patch was the earliest point at which we converted to internal functions. The idea was to make code treat ECF_PURE built-in functions and internal functions as being basically equivalent. There's therefore not much benefit to doing a straight replacement of one with the other during either GENERIC or gimple. Instead the series only used internal functions for things that built-in functions couldn't do, specifically: - the case used in this patch, to optimise part of a non-pure built-in function using a pure equivalent. - vector versions of built-in functions. The cfgexpand patch makes sure that pure built-in functions are expanded like internal functions where possible. > Which would leave us with a lowering stage early in the main > optimization pipeline - I think fold_builtins pass is way too late but > any "folding" pass will do (like forwprop or backprop where the latter > might be better because it might end up computing FP "ranges" to > improve the initial lowering code). This isn't at all related to what backprop is doing though. backprop is about optimising definitions based on information about all uses. Does fold_builtins need to be as late as it is? > Of course call_cdce is as good as long as it still exists. Does this meann that you're not against the patch in principle (i.e. keeping call_cdce for now and extending it in the way that this patch does)? Thanks, Richard