On 02/18/2016 02:56 AM, Richard Biener wrote:
Just a short quick comment - the above means you only handle partial
stores
with no interveaning uses.  You don't handle, say

struct S { struct R { int x; int y; } r; int z; } s;

   s = { {1, 2}, 3 };
   s.r.x = 1;
   s.r.y = 2;
   struct R r = s.r;
   s.z = 3;

where s = { {1, 2}, 3} is still dead.

Right.  But handling that has never been part of DSE's design goals. Once
there's a use, DSE has always given up.

Yeah, which is why I in the end said we need a "better" DSE ...
So I cobbled up a quick test for this. I only handle assignments which may reference the same memory as the currently tracked store. Obviously that could be extended to handle certain builtins and the like.

It triggers a bit here and there. While looking at those cases it occurred to me that, we could also look at this as a failure earlier in the optimization pipeline. In fact DOM already has code to handle a closely related situation.

When DOM sees a store to memory, it creates a new fake statement with the RHS and LHS reversed. So in the case above DOM creates statements that look like:

1 = s.r.x
2 = s.r.y

DOM then puts the RHS into the available expression table as equivalent to the LHS. So if it finds a later load of s.r.x, it will replace that load with "1". I haven't looked at it in a while, but it certainly was functional prior to the tuple merge.

Presumably DOM is not looking at r = s.r and realizing it could look s.r piece-wise in the available expression table. If it did, it would effectively turn that fragment into:

    s = { {1, 2}, 3 };
    s.r.x = 1;
    s.r.y = 2;
    struct R r = {1, 2}
    s.z = 3;

At which point we no longer have the may-read of s.r.{x,y} and DSE would see the initial assignment as dead.

I'm not sure if it's advisable to teach DOM how to lookup structure references piecewise or not. The code to handle this case in DSE is relatively simple, so perhaps we just go with the DSE variant.

I also looked a bit at cases where we find that while an entire store (such as an aggregate initialization or mem*) may not be dead, pieces of the store may be dead. That's trivial to detect. It triggers relatively often. The trick is once detected, we have to go back and rewrite the original statement to only store the live parts. I've only written the detection code, the rewriting might be somewhat painful.

I'm starting to wonder if what we have is a 3-part series.

[1/3] The basic tracking to handle 33562, possibly included in gcc-6
[2/3] Ignore reads that reference stuff not in live_bytes for gcc-7
[3/3] Detect partially dead aggregate stores, rewriting the partially
      dead store to only store the live bytes.  Also for gcc-7.


Obviously [1/3] would need compile-time benchmarking, but I really don't expect any issues there.

Jeff

Reply via email to