Hi Richard, Thanks for the comments -- a few responses below.
On Tue, 2011-10-11 at 13:40 +0200, Richard Guenther wrote: > On Sat, 8 Oct 2011, William J. Schmidt wrote: > <snip> > > > + c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT); > > You don't verify that bitpos % BITS_PER_UNIT is zero anywhere. I'll add a check in the caller. I was thinking this was unnecessary since I had excluded bitfield operations, but on reflection that may not be sufficient. <snip> > > > + mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL, > > + true, GSI_SAME_STMT); > > + add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, > > mult_expr); > > + add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL, > > + true, GSI_SAME_STMT); > > + mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr, > > + build_int_cst (offset_type, double_int_to_shwi (c))); > > double_int_to_tree (offset_type, c) > > Please delay gimplification to the caller, that way this function > solely operates on the trees returned from get_inner_reference. > Or are you concerned that fold might undo your association? I'll try that. I was just basing this on some suggestions you had made earlier; I don't believe there is any problem with delaying it. <snip> > > > > for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) > > { > > gimple stmt = gsi_stmt (gsi); > > > > - if (is_gimple_assign (stmt) > > - && !stmt_could_throw_p (stmt)) > > + /* During late reassociation only, look for restructuring > > + opportunities within an expression that references memory. > > + We only do this for blocks not contained in loops, since the > > + ivopts machinery does a good job on loop expressions, and we > > + don't want to interfere with other loop optimizations. */ > > I'm not sure I buy this. IVOPTs would have produced [TARGET_]MEM_REFs > which you don't handle. Did you do any measurements what happens if > you enable it generally? Actually I agree with you -- in an earlier iteration this was still enabled for reassoc1 ahead of loop optimizations and was causing degradations. So long as it doesn't occur early it should be fine to do everywhere now, and catch non-ivar cases in loops. <snip> > You verified the patch has no performance degradations on ppc64 > for SPEC CPU, did you see any improvements? > Yes, a few in the 2-3% range. Nothing stellar. > The pattern matching is still very ad-hoc and doesn't consider > statements that feed the base address. There is conceptually > no difference between p->a[n] and *(p + n * 4). That's true. Since we abandoned the general address-lowering approach, this was aimed at the specific pattern that comes up frequently in practice. I would expect the *(p + n * 4) cases to be handled by the general straight-line strength reduction, which is the correct long-term approach. (Cases like p->a[n], where the multiplication is not yet explicit, will be a bit of a wart as part of strength reduction, too, but that's still the right place for it eventually.) > You placed this > lowering in reassoc to catch CSE opportunities with DOM, right? > Does RTL CSE not do it's job or is the transform undone by > fwprop before it gets a chance to do it? I think with Paolo's suggested patch for RTL CSE, this could be moved back to expand. I will have to experiment with it again to make sure. If so, that would certainly be my preference as well. (Or having the whole problem just disappear might be my preference on some days... :) Thanks, Bill