On 2/1/21 12:47 PM, Richard Biener wrote:
> On February 1, 2021 8:34:35 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote:
>>
>> On 1/28/21 1:09 AM, Richard Biener wrote:
>>> On Wed, 27 Jan 2021, Jakub Jelinek wrote:
>>>
>>>> On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote:
>>>>> The following avoids repeatedly turning VALUE RTXen into
>>>>> sth useful and re-applying a constant offset through get_addr
>>>>> via DSE check_mem_read_rtx. Instead perform this once for
>>>>> all stores to be visited in check_mem_read_rtx. This avoids
>>>>> allocating 1.6GB of garbage PLUS RTXen on the PR80960
>>>>> testcase, fixing the memory usage regression from old GCC.
>>>>>
>>>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> 2021-01-27 Richard Biener <rguent...@suse.de>
>>>>>
>>>>> PR rtl-optimization/80960
>>>>> * dse.c (check_mem_read_rtx): Call get_addr on the
>>>>> offsetted address.
>>>>> ---
>>>>> gcc/dse.c | 5 +++++
>>>>> 1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/gcc/dse.c b/gcc/dse.c
>>>>> index c88587e7d94..da0df54a2dd 100644
>>>>> --- a/gcc/dse.c
>>>>> +++ b/gcc/dse.c
>>>>> @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t
>> bb_info)
>>>>> }
>>>>> if (maybe_ne (offset, 0))
>>>>> mem_addr = plus_constant (get_address_mode (mem), mem_addr,
>> offset);
>>>>> + /* Avoid passing VALUE RTXen as mem_addr to
>> canon_true_dependence
>>>>> + which will over and over re-create proper RTL and re-apply
>> the
>>>>> + offset above. See PR80960 where we almost allocate 1.6GB of
>> PLUS
>>>>> + RTXen that way. */
>>>>> + mem_addr = get_addr (mem_addr);
>>>>>
>>>>> if (group_id >= 0)
>>>>> {
>>>> Does that result in any changes on how much does DSE optimize?
>>>> I mean, if you do 2 bootstraps/regtests, one with this patch and
>> another one
>>>> without it, and at the end of rest_of_handle_dse dump
>>>> locally_deleted, globally_deleted
>>>> for each CU/function, do you get the same counts except perhaps for
>> dse.c?
>>> So I see no difference for stage2-gcc/*.o dse1/dse2 with/without the
>>> patch but counts are _extremely_ small. Statistics:
>>>
>>> 70148 dse: local deletions = 0, global deletions = 0
>>> 32 dse: local deletions = 0, global deletions = 1
>>> 9 dse: local deletions = 0, global deletions = 2
>>> 7 dse: local deletions = 0, global deletions = 3
>>> 2 dse: local deletions = 0, global deletions = 4
>>> 2 dse: local deletions = 0, global deletions = 5
>>> 3 dse: local deletions = 0, global deletions = 7
>>> 67 dse: local deletions = 1, global deletions = 0
>>> 1 dse: local deletions = 1, global deletions = 2
>>> 12 dse: local deletions = 2, global deletions = 0
>>> 1 dse: local deletions = 24, global deletions = 1
>>> 2 dse: local deletions = 3, global deletions = 0
>>> 4 dse: local deletions = 4, global deletions = 0
>>> 4 dse: local deletions = 6, global deletions = 0
>>> 1 dse: local deletions = 7, global deletions = 0
>>> 1 dse: local deletions = 8, global deletions = 0
>>>
>>> so not sure how much confidence this brings over the analytical
>>> reasoning that it shouldn't make a difference ...
>>>
>>> stats on just dse2 are even more depressing (given it's cost)
>>>
>>> 35123 dse: local deletions = 0, global deletions = 0
>>> 2 dse: local deletions = 0, global deletions = 1
>>> 20 dse: local deletions = 1, global deletions = 0
>>> 1 dse: local deletions = 2, global deletions = 0
>>> 1 dse: local deletions = 3, global deletions = 0
>>> 1 dse: local deletions = 4, global deletions = 0
>> Based on that, I'd argue that DSE2 should go away and DSE1 should be
>> evaluated for the chopping block. While RTL DSE was marginally
>> important in 1999 when it was first submitted, the tree-ssa pipeline as
>> a whole has probably made RTL DSE largely pointless.
> True. Though I'd argue that DSE2 might be the conceptually more useful pass
> since it sees spill slots.
True in concept, but I bet that the SSA pipeline has made this much less
common in RTL DSE than it was 20+ years ago. Our allocator and reloader
are much improved as well which would further decrease the number of
opportunities.
I'd hazard a guess that what's left are locals that need to be
addressable and some optimization in the RTL pipeline exposed a dead
store that wasn't otherwise visible in the SSA pipeline. BUt the only
way to be sure would be to dig into them.
jeff