Similar situation happens in non loop context as well. PRE commoned address computation without knowing the existence of advanced addressing mode, which result in unnecessary address computation instruction. The forward substitution code makes local heuristics and looks at each use individually -- it does not know if the propagation will happen for all uses and therefore exposes DCE opportunity -- so a precise cost estimation is not available. Even so, for such cases, a simple change of 'gain > 0' into 'gain >= 0' in should_replace_address_p can do the job.
For LIM case discussed in this thread, it is trickier to estimate the cost of forward substitution without knowing the register pressure -- forward prop MAY increase the live range of the propagated value (RHS), even though in this case it does not, and it actually shrinks the LR of the LHS temps, thus reducing register pressure overall. I have submitted a live range shrink (LRS) patch a while back, but it was not accepted. This address computation propagation can be easily implemented in the LPS pass with precise knowledge of the change of register pressure. In general it will be tricky for latter passes to clean up the messes. The fundamental problem is that the address computation is exposed to PRE prematurely (for a given target ) at GIMPLE level. In this case, if the INDIRECT_REFs are expressed as MEM_REFs, such problem might be avoided. A similar issue (for ARM) is reported in bug 40956. Thanks, David On Wed, Dec 23, 2009 at 10:06 AM, Paolo Bonzini <bonz...@gnu.org> wrote: > > On 12/23/2009 06:47 PM, H.J. Lu wrote: >> >> On Wed, Dec 23, 2009 at 8:41 AM, Paolo Bonzini<bonz...@gnu.org> wrote: >>> >>> On 12/23/2009 04:19 PM, Bingfeng Mei wrote: >>>> >>>> It seems that just commenting out this check in fwprop.c should work. >>> >>> Yes, but it would pessimize x86. >>> >> >> Is there a bug open for x86? Can't we make it target dependent, something >> like >> >> /* Do not propagate loop invariant definitions inside the loop. */ >> if (targetm.foobar >> && DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father) >> return; > > I'll open a bug. The solution is to actually understand what the address > costs are on x86 (apparently it's not true that the more complex addressing > modes are always better, probably because of instruction sizes), not to add a > target macro. > > Paolo