Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

Ajit Agarwal Wed, 14 Feb 2024 09:57:02 -0800

Hello Sam:

On 14/02/24 10:50 pm, Sam James wrote:
> 
> Ajit Agarwal <[email protected]> writes:
> 
>> Hello Richard:
>>
>>
>> On 14/02/24 4:03 pm, Richard Sandiford wrote:
>>> Hi,
>>>
>>> Thanks for working on this.
>>>
>>> You posted a version of this patch on Sunday too.  If you need to repost
>>> to fix bugs or make other improvements, could you describe the changes
>>> that you've made since the previous version?  It makes things easier
>>> to follow.
>>
>> Sure. Sorry for that I forgot to add that.
>>
>>>
>>> Also, sorry for starting with a meta discussion about reviews, but
>>> there are multiple types of review comment, including:
>>>
>>> (1) Suggestions for changes that are worded as suggestions.
>>>
>>> (2) Suggestions for changes that are worded as questions ("Wouldn't it be
>>>     better to do X?", etc).
>>>
>>> (3) Questions asking for an explanation or for more information.
>>>
>>> Just sending a new patch makes sense when the previous review comments
>>> were all like (1), and arguably also (1)+(2).  But Alex's previous review
>>> included (3) as well.  Could you go back and respond to his questions there?
>>> It would help understand some of the design choices.
>>>
>>
>> I have responded to Alex comments for the previous patches.
>> I have incorporated all of his comments in this patch.
>>
>>  
>>> A natural starting point when reviewing a patch like this is to diff
>>> the current aarch64-ldp-fusion.cc with the new pair-fusion.cc.  This shows
>>> many of the kind of changes that I'd expect.  But it also seems to include
>>> some code reordering, such as putting fuse_pair after try_fuse_pair.
>>> If some reordering is necessary, could you try to organise the patch as
>>> a series in which the reordering is a separate step?  It's a bit hard
>>> to review at the moment.  (Reordering for cosmetic reasons is also OK,
>>> but again please separate it out for ease of review.)
>>>
>>> Maybe one way of making the review easier would be to split the aarch64
>>> pass into the "target-dependent" and "target-independent" pieces
>>> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
>>> (as separate patches) move the target-independent pieces outside
>>> config/aarch64.
>>>
>> Sure I will do that.
>>
>>> The patch includes:
>>>
>>>>    * emit-rtl.cc: Modify ge with gt on PolyINT data structure.
>>>>    * dce.cc: Add changes not to delete the load store pair.
>>>>    * rtl-ssa/changes.cc: Modified assert code.
>>>>    * var-tracking.cc: Modified assert code.
>>>>    * df-problems.cc: Not to generate REG_UNUSED for multi
>>>>    word registers that is requied for rs6000 target.
>>>
>>> Please submit these separately, as independent preparatory patches,
>>> with an explanation for why they're needed & correct.  But:
>>>
>> Sure I will do that.
>>
>>>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
>>>> index 88ee0dd67fc..a8d0ee7c4db 100644
>>>> --- a/gcc/df-problems.cc
>>>> +++ b/gcc/df-problems.cc
>>>> @@ -3360,7 +3360,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct 
>>>> df_mw_hardreg *mws,
>>>>    if (df_whole_mw_reg_unused_p (mws, live, artificial_uses))
>>>>      {
>>>>        unsigned int regno = mws->start_regno;
>>>> -      df_set_note (REG_UNUSED, insn, mws->mw_reg);
>>>> +      //df_set_note (REG_UNUSED, insn, mws->mw_reg);
>>>>        dead_debug_insert_temp (debug, regno, insn, 
>>>> DEBUG_TEMP_AFTER_WITH_REG);
>>>>  
>>>>        if (REG_DEAD_DEBUGGING)
>>>> @@ -3375,7 +3375,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct 
>>>> df_mw_hardreg *mws,
>>>>    if (!bitmap_bit_p (live, r)
>>>>        && !bitmap_bit_p (artificial_uses, r))
>>>>      {
>>>> -      df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]);
>>>> +     // df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]);
> 
> I just want to emphasise here:
> a) adding out commented code is very unusual (I know a reviewer picked
> up on that already);
> 
> b) if you are going to comment something out as a hack / you need help,
> please *clearly flag that* (apologies if I missed it), and possibly add
> a comment above it saying "// TODO: Need to figure out ...." or similar,
> otherwise it just looks like it was forgotten about.
> 
> In this case, your question about how to handle REG_UNUSED should've
> been made clear in a summary at the top where you mention the
> outstanding items. Again, sorry if I missed it.
>


Question is not about how to handle REG_UNUSED,

I am afraid this is not what I meant, I wanted to
convey the following.

REG_UNSED generated by ira with the above code used by cprop_hardreg to remove 
the code with REG_UNUSED notes.

We can modify these passes to handle REG_UNUSED 
differently or not to generate REG_UNUSED for
multi-word case as above. 

We use similar to we do as follows:

We generate sequential registers using (subreg V16QI (reg 00mode) 16) and (reg 
OOmode 0) where OOmode is 256 bit and V16QI is 128 bits in order to generate 
sequential register pair. If I keep the above REG_UNUSED notes ira generates 
REG_UNUSED and in cprop_harreg pass and dce pass deletes store pairs and we get 
incorrect code.

I choose not to generate REG_UNUSED for multi-word case instead of handling in 
different passes lower the pipeline.

Thanks & Regards
Ajit
>>>>        dead_debug_insert_temp (debug, r, insn, DEBUG_TEMP_AFTER_WITH_REG);
>>>>        if (REG_DEAD_DEBUGGING)
>>>>          df_print_note ("adding 2: ", insn, REG_NOTES (insn));
>>>> @@ -3493,9 +3493,9 @@ df_create_unused_note (rtx_insn *insn, df_ref def,
>>>>    || bitmap_bit_p (artificial_uses, dregno)
>>>>    || df_ignore_stack_reg (dregno)))
>>>>      {
>>>> -      rtx reg = (DF_REF_LOC (def))
>>>> -                ? *DF_REF_REAL_LOC (def): DF_REF_REG (def);
>>>> -      df_set_note (REG_UNUSED, insn, reg);
>>>> +      //rtx reg = (DF_REF_LOC (def))
>>>> +      //          ? *DF_REF_REAL_LOC (def): DF_REF_REG (def);
>>>> +      //df_set_note (REG_UNUSED, insn, reg);
>>>>        dead_debug_insert_temp (debug, dregno, insn, 
>>>> DEBUG_TEMP_AFTER_WITH_REG);
>>>>        if (REG_DEAD_DEBUGGING)
>>>>    df_print_note ("adding 3: ", insn, REG_NOTES (insn));
>>>
>>> I don't think this can be right.  The last hunk of the var-tracking.cc
>>> patch also seems to be reverting a correct change.
>>>
>>
>> We generate sequential registers using (subreg V16QI (reg 00mode) 16)
>> and (reg OOmode 0)
>> where OOmode is 256 bit and V16QI is 128 bits in order to generate
>> sequential register pair. If I keep the above REG_UNUSED notes ira generates
>> REG_UNUSED and in cprop_harreg pass and dce pass deletes store pairs and
>> we get incorrect code.
>>
>> By commenting REG_UNUSED notes it is not generated and we get the correct 
>> store
>> pair fusion and cprop_hardreg and dce doesn't deletes them.
>>
>> Please let me know is there are better ways to address this instead of 
>> commenting
>> above generation of REG_UNUSED notes.
>>
>> Thanks & Regards
>> Ajit
>>> Thanks,
>>> Richard
>

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

Reply via email to