On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt <wschm...@linux.vnet.ibm.com> wrote: > > On Tue, 2011-06-07 at 12:06 +0200, Richard Guenther wrote: >> On Mon, Jun 6, 2011 at 8:07 PM, William J. Schmidt >> <wschm...@linux.vnet.ibm.com> wrote: > > <snip> > >> > * If the original expression will be recognized as a "hidden global >> > store" in >> > tree-ssa-sink.c:is_hidden_global_store, but the replacement expression >> > will >> > not, it is possible for the dead code eliminator to remove the modified >> > statement. It seems to me this shouldn't normally happen in practice. >> > For >> > now I detect this case and refuse to do the replacement, but I suspect a >> > possible front-end or upstream-optimization problem here. The test case >> > that fails here is libstdc++-v3/testsuite/23_containers/vector/ >> > ext_pointer_modifiers/insert.cc. More investigation required. >> >> That indeed sounds odd. > > When I looked into it, the addressing expression was fairly complex > initially, with templates and inheritance involved. The code to create > the affine tree combination was able to collapse a great deal of the > arithmetic and produce something much simpler that no longer referenced > the item that had made it look like a global store originally. (I.e., > buried in the expression were an (&x + a) and a -(&x + a) that cancelled > out and removed all traces of x.) It appeared to me that this was being > done correctly, but it was a very complex expression and I might have > missed something. > > The result was that essentially the entire procedure ended up going > dead. It's possible that some information was either lost or not > provided by the front end that should have prevented that. > > <snip> > >> > Loss of aliasing information >> > ============================ >> > The most serious problem I've run into is degraded performance due to >> > poorer >> > instruction scheduling choices. I tracked this down to >> > alias.c:nonoverlapping_component_refs_p. >> > >> > This code proves that two memory accesses don't overlap by attempting to >> > prove >> > that they access different fields of the same structure. This is done >> > using >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were >> > translated into the rtx's during expand. When address lowering is not >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR: x.a, for >> > example. However, address lowering changes the simple COMPONENT_REF into a >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field >> > reference. Thus the aliasing machinery can no longer prove that two such >> > field references are disjoint. >> > >> > This has severe consequences for performance, and has to be dealt with if >> > address lowering is to be successful. >> > >> > I've worked around this with an admittedly fragile solution; I'll discuss >> > the >> > drawbacks below. The idea is to construct a mapping from replacement >> > mem_refs >> > to the original expressions that they replaced. When a MEM_EXPR is being >> > set >> > during expand, we first look up the mem_ref in the mapping. If present, >> > the >> > MEM_EXPR is set to the original expression, rather than to the mem_ref. >> > This >> > essentially duplicates the behavior in the absence of address lowering. >> >> Ick. We had this in the past via TMR_ORIGINAL which caused all sorts >> of problems. Removing it didn't cause much degradation because we now >> preserve points-to information. >> >> Originally I played with lowering all memory accesses to MEM_REFs >> (see the old mem-ref branch), and the loss of type-based alias >> disambiguation was indeed an issue. >> >> But - I definitely do not like the idea of preserving something similar >> to TMR_ORIGINAL. Instead we can try preserving some information >> we derive from it. We keep the original access type that we can use >> for TBAA but do not retain knowledge on whether the type of the >> MEM_REF is valid for TBAA or if it is view-converted. > > Yes, I really don't like what I have at the moment, either. I put it in > place as a stopgap to let me proceed to look for other performance > problems. > > The question is how we can infer useful information for TBAA from the > MEM_REFs and TMRs. I poked at trying to identify types and offsets from > the MEM_EXPRs, but this ended up being useless; I had to constrain too > many cases to maintain correctness, and couldn't prove the type > information for the important cases in SPEC I was trying to address. > > Unfortunately, the whole design goes down the drain if we can't find a > way to solve the TBAA issue. The performance degradations are too > costly.
If you look at what basic TBAA the alias oracle performs then it boils down to the fact that get_alias_set for a.b.c might end up using the alias-set of the type of C but for MEM[&a + 4] it will use the alias set of the type of a. The tree alias-oracle extracts both alias sets, that of the outermost valid type and that of the innermost as both are equally useful. But the MEM_REF (or TARGET_MEM_REF) tree only have storage for one such alias-set. Thus my idea at some point was to store the other one as well in some form. It will not be the full information (after all, the complete access path does provide some extra information - see aliasing_component_refs_p). Btw, I'm looking at lowering bitfield accesses to read-modify-write cycles and in that context also to lowering unaligned accesses for targets that do not support them. Richard. > <snip> > > Thanks, > Bill > >