On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
> <wschm...@linux.vnet.ibm.com> wrote:
> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> >> <wschm...@linux.vnet.ibm.com> wrote:
> >
> > <snip>
> >
> >> >> > Loss of aliasing information
> >> >> > ============================
> >> >> > The most serious problem I've run into is degraded performance due to 
> >> >> > poorer
> >> >> > instruction scheduling choices. I tracked this down to
> >> >> > alias.c:nonoverlapping_component_refs_p.
> >> >> >
> >> >> > This code proves that two memory accesses don't overlap by attempting 
> >> >> > to prove
> >> >> > that they access different fields of the same structure. This is done 
> >> >> > using
> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees 
> >> >> > that were
> >> >> > translated into the rtx's during expand. When address lowering is not
> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR: x.a, for
> >> >> > example. However, address lowering changes the simple COMPONENT_REF 
> >> >> > into a
> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> >> >> > reference. Thus the aliasing machinery can no longer prove that two 
> >> >> > such
> >> >> > field references are disjoint.
> >> >> >
> >> >> > This has severe consequences for performance, and has to be dealt 
> >> >> > with if
> >> >> > address lowering is to be successful.
> >> >> >
> >> >> > I've worked around this with an admittedly fragile solution; I'll 
> >> >> > discuss the
> >> >> > drawbacks below. The idea is to construct a mapping from replacement 
> >> >> > mem_refs
> >> >> > to the original expressions that they replaced. When a MEM_EXPR is 
> >> >> > being set
> >> >> > during expand, we first look up the mem_ref in the mapping. If 
> >> >> > present, the
> >> >> > MEM_EXPR is set to the original expression, rather than to the 
> >> >> > mem_ref. This
> >> >> > essentially duplicates the behavior in the absence of address 
> >> >> > lowering.
> >> >>
> >> >> Ick. We had this in the past via TMR_ORIGINAL which caused all sorts
> >> >> of problems. Removing it didn't cause much degradation because we now
> >> >> preserve points-to information.
> >> >>
> >> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> >> (see the old mem-ref branch), and the loss of type-based alias
> >> >> disambiguation was indeed an issue.
> >> >>
> >> >> But - I definitely do not like the idea of preserving something similar
> >> >> to TMR_ORIGINAL. Instead we can try preserving some information
> >> >> we derive from it. We keep the original access type that we can use
> >> >> for TBAA but do not retain knowledge on whether the type of the
> >> >> MEM_REF is valid for TBAA or if it is view-converted.
> >> >
> >> > Yes, I really don't like what I have at the moment, either. I put it in
> >> > place as a stopgap to let me proceed to look for other performance
> >> > problems.
> >> >
> >> > The question is how we can infer useful information for TBAA from the
> >> > MEM_REFs and TMRs. I poked at trying to identify types and offsets from
> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> >> > many cases to maintain correctness, and couldn't prove the type
> >> > information for the important cases in SPEC I was trying to address.
> >> >
> >> > Unfortunately, the whole design goes down the drain if we can't find a
> >> > way to solve the TBAA issue. The performance degradations are too
> >> > costly.
> >>
> >> If you look at what basic TBAA the alias oracle performs then it boils
> >> down to the fact that get_alias_set for a.b.c might end up using the
> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> >> of the type of a. The tree alias-oracle extracts both alias sets, that
> >> of the outermost valid type and that of the innermost as both are
> >> equally useful. But the MEM_REF (or TARGET_MEM_REF) tree
> >> only have storage for one such alias-set. Thus my idea at some point
> >> was to store the other one as well in some form. It will not be
> >> the full information (after all, the complete access path does provide
> >> some extra information - see aliasing_component_refs_p).
> >
> > This is what concerns me. TBAA information for the outer and inner
> > components doesn't seem sufficient to provide what
> > nonoverlapping_component_refs_p is currently able to prove. The latter
> > searches for a common RECORD_TYPE somewhere along the two access paths,
> > and then disambiguates if the two associated referenced fields differ.
> > For a simple case like "struct x { int a; int b; };", a and b have the
> > same type and alias-set, so the alias-set information doesn't add
> > anything. It isn't sufficient alone for the disambiguation of x1.a =
> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
> >
> > Obviously the offset is sufficient to disambiguate for this simple case
> > with a common base type, but when the shared record types aren't at the
> > outermost level, we can't detect whether it is.
> >
> > At the moment I don't see how we can avoid degradation unless we keep
> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
> > COMPONENT_REFs. I hope I'm wrong.
> 
> You are not wrong.  But the question is, does it make a difference?
> 
After some rework and measurements, I'm going to change my answer to
this question.  The loss of same-type field disambiguation does make a
difference under limited circumstances.  But at this time, I don't think
those differences are significant enough to warrant the complexities
required to avoid them.

The changes you made a couple weeks ago to tree-ssa-address.c, and the
associated aliasing changes, caused me to remove some heuristics and add
others.  My measurements changed noticeably as a result.  At this point,
the address-lowering patch I've been working on has detrimental effects
(on powerpc64) only on 188.ammp.  As discussed before, this is caused by
the lack of field disambiguation during scheduling of a large hot loop
with high register pressure.  However, the degradation I am seeing is
now roughly 2%.

So with only 2% degradation on a single benchmark, my thought is to add
some commentary to the code about this concern, but not to attempt to
address it.  I would include the salient points from this discussion in
the commentary.

Sound reasonable?

Bill

> Richard.


Reply via email to