On 2/1/21 12:47 PM, Richard Biener wrote:
> On February 1, 2021 8:34:35 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote:
>>
>> On 1/28/21 1:09 AM, Richard Biener wrote:
>>> On Wed, 27 Jan 2021, Jakub Jelinek wrote:
>>>
>>>> On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote:
>>>>> The following avoids repeatedly turning VALUE RTXen into
>>>>> sth useful and re-applying a constant offset through get_addr
>>>>> via DSE check_mem_read_rtx.  Instead perform this once for
>>>>> all stores to be visited in check_mem_read_rtx.  This avoids
>>>>> allocating 1.6GB of garbage PLUS RTXen on the PR80960
>>>>> testcase, fixing the memory usage regression from old GCC.
>>>>>
>>>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> 2021-01-27  Richard Biener  <rguent...@suse.de>
>>>>>
>>>>>   PR rtl-optimization/80960
>>>>>   * dse.c (check_mem_read_rtx): Call get_addr on the
>>>>>   offsetted address.
>>>>> ---
>>>>>  gcc/dse.c | 5 +++++
>>>>>  1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/gcc/dse.c b/gcc/dse.c
>>>>> index c88587e7d94..da0df54a2dd 100644
>>>>> --- a/gcc/dse.c
>>>>> +++ b/gcc/dse.c
>>>>> @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t
>> bb_info)
>>>>>      }
>>>>>    if (maybe_ne (offset, 0))
>>>>>      mem_addr = plus_constant (get_address_mode (mem), mem_addr,
>> offset);
>>>>> +  /* Avoid passing VALUE RTXen as mem_addr to
>> canon_true_dependence
>>>>> +     which will over and over re-create proper RTL and re-apply
>> the
>>>>> +     offset above.  See PR80960 where we almost allocate 1.6GB of
>> PLUS
>>>>> +     RTXen that way.  */
>>>>> +  mem_addr = get_addr (mem_addr);
>>>>>  
>>>>>    if (group_id >= 0)
>>>>>      {
>>>> Does that result in any changes on how much does DSE optimize?
>>>> I mean, if you do 2 bootstraps/regtests, one with this patch and
>> another one
>>>> without it, and at the end of rest_of_handle_dse dump
>>>> locally_deleted, globally_deleted
>>>> for each CU/function, do you get the same counts except perhaps for
>> dse.c?
>>> So I see no difference for stage2-gcc/*.o dse1/dse2 with/without the
>>> patch but counts are _extremely_ small.  Statistics:
>>>
>>>   70148 dse: local deletions = 0, global deletions = 0
>>>      32 dse: local deletions = 0, global deletions = 1
>>>       9 dse: local deletions = 0, global deletions = 2
>>>       7 dse: local deletions = 0, global deletions = 3
>>>       2 dse: local deletions = 0, global deletions = 4
>>>       2 dse: local deletions = 0, global deletions = 5
>>>       3 dse: local deletions = 0, global deletions = 7
>>>      67 dse: local deletions = 1, global deletions = 0
>>>       1 dse: local deletions = 1, global deletions = 2
>>>      12 dse: local deletions = 2, global deletions = 0
>>>       1 dse: local deletions = 24, global deletions = 1
>>>       2 dse: local deletions = 3, global deletions = 0
>>>       4 dse: local deletions = 4, global deletions = 0
>>>       4 dse: local deletions = 6, global deletions = 0
>>>       1 dse: local deletions = 7, global deletions = 0
>>>       1 dse: local deletions = 8, global deletions = 0
>>>
>>> so not sure how much confidence this brings over the analytical
>>> reasoning that it shouldn't make a difference ...
>>>
>>> stats on just dse2 are even more depressing (given it's cost)
>>>
>>>   35123 dse: local deletions = 0, global deletions = 0
>>>       2 dse: local deletions = 0, global deletions = 1
>>>      20 dse: local deletions = 1, global deletions = 0
>>>       1 dse: local deletions = 2, global deletions = 0
>>>       1 dse: local deletions = 3, global deletions = 0
>>>       1 dse: local deletions = 4, global deletions = 0
>> Based on that, I'd argue that DSE2 should go away and DSE1 should be
>> evaluated for the chopping block.  While RTL DSE was marginally
>> important in 1999 when it was first submitted, the tree-ssa pipeline as
>> a whole has probably made RTL DSE largely pointless.
> True. Though I'd argue that DSE2 might be the conceptually more useful pass 
> since it sees spill slots. 
True in concept, but I bet that the SSA pipeline has made this much less
common in RTL DSE than it was 20+ years ago.  Our allocator and reloader
are much improved as well which would further decrease the number of
opportunities.

I'd hazard a guess that what's left are locals that need to be
addressable and some optimization in the RTL pipeline exposed a dead
store that wasn't otherwise visible in the SSA pipeline.  BUt the only
way to be sure would be to dig into them.

jeff

Reply via email to