On 02/18/2016 02:56 AM, Richard Biener wrote:
Just a short quick comment - the above means you only handle partial
stores
with no interveaning uses. You don't handle, say
struct S { struct R { int x; int y; } r; int z; } s;
s = { {1, 2}, 3 };
s.r.x = 1;
s.r.y = 2;
struct R r = s.r;
s.z = 3;
where s = { {1, 2}, 3} is still dead.
Right. But handling that has never been part of DSE's design goals. Once
there's a use, DSE has always given up.
Yeah, which is why I in the end said we need a "better" DSE ...
So I cobbled up a quick test for this. I only handle assignments which
may reference the same memory as the currently tracked store. Obviously
that could be extended to handle certain builtins and the like.
It triggers a bit here and there. While looking at those cases it
occurred to me that, we could also look at this as a failure earlier in
the optimization pipeline. In fact DOM already has code to handle a
closely related situation.
When DOM sees a store to memory, it creates a new fake statement with
the RHS and LHS reversed. So in the case above DOM creates statements
that look like:
1 = s.r.x
2 = s.r.y
DOM then puts the RHS into the available expression table as equivalent
to the LHS. So if it finds a later load of s.r.x, it will replace that
load with "1". I haven't looked at it in a while, but it certainly was
functional prior to the tuple merge.
Presumably DOM is not looking at r = s.r and realizing it could look s.r
piece-wise in the available expression table. If it did, it would
effectively turn that fragment into:
s = { {1, 2}, 3 };
s.r.x = 1;
s.r.y = 2;
struct R r = {1, 2}
s.z = 3;
At which point we no longer have the may-read of s.r.{x,y} and DSE would
see the initial assignment as dead.
I'm not sure if it's advisable to teach DOM how to lookup structure
references piecewise or not. The code to handle this case in DSE is
relatively simple, so perhaps we just go with the DSE variant.
I also looked a bit at cases where we find that while an entire store
(such as an aggregate initialization or mem*) may not be dead, pieces of
the store may be dead. That's trivial to detect. It triggers
relatively often. The trick is once detected, we have to go back and
rewrite the original statement to only store the live parts. I've only
written the detection code, the rewriting might be somewhat painful.
I'm starting to wonder if what we have is a 3-part series.
[1/3] The basic tracking to handle 33562, possibly included in gcc-6
[2/3] Ignore reads that reference stuff not in live_bytes for gcc-7
[3/3] Detect partially dead aggregate stores, rewriting the partially
dead store to only store the live bytes. Also for gcc-7.
Obviously [1/3] would need compile-time benchmarking, but I really don't
expect any issues there.
Jeff