On 02/14/2016 11:38 AM, Richard Biener wrote:
On February 14, 2016 5:35:13 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote:
On 02/12/2016 10:21 AM, Jeff Law wrote:
But really we simply need a better DSE algorithm.
So the way to fix DSE is to keep the existing algorithm and track the
hunks of Complex/aggregates that have been set a second time.
My first thought was to implement this as an inverted bitmap. ie,
set
it to 1 for every byte in the complex/aggregate that is set by the
first
store. Clear bits for each byte in subsequent stores to the pieces.
If
the bitmap reaches an empty state, then the initial store is dead.
Adjusting *ref could work too, but would probably be painful if the
subsequent stores don't happen in a convenient order.
So that was scary easy. We should have done this a long time ago.
Essentially I call ao_get_ref_base to get the offset/size/max_size
filled in for the first statement. Those are used to initialize the
live bytes bitfield, as long as max_size != -1.
Then when we have a possible killing statement, we use the ao in a
similar manner to determine which bytes to clear (taking care that the
base is the same between the two references and that in the killing
statement that the size/max_size are the same.).
When all the live bytes are zero then we've killed the original
statement.
It's ~20 lines of code.
I need to pull together some additional tests, but it looks likely
we'll
be able to wrap this up easily for gcc-6.
BTW, we had sth like this before but it had both correctness and more
importantly scalability issues.
If you're referring to 48141, that was a completely different issue and
I don't think there's any significant parallels between how that
affected RTL DSE and tree DSE.
The issue in 48141 is that RTL DSE keeps a list of active stores. Those
lists were getting long, often with things that just weren't interesting.
Tree DSE doesn't maintain a list of that nature (it used to, but you
fixed all that a few years back). These days it does a walk of the IL.
When it finds a memory store, it walks immediate uses of the virtual
operands to see if an immediate use happens to write the same memory
location. If it does, then the first write is dead. That's (of course)
over-simplified, but captures the essence of how tree DSE works.
As long as the calls to ao_ref_init/ao_ref are cheap, my code shouldn't
change the overall performance of tree DSE.
Jeff