Ajit Agarwal <aagar...@linux.ibm.com> writes:
> Hello Richard: > > > On 14/02/24 4:03 pm, Richard Sandiford wrote: >> Hi, >> >> Thanks for working on this. >> >> You posted a version of this patch on Sunday too. If you need to repost >> to fix bugs or make other improvements, could you describe the changes >> that you've made since the previous version? It makes things easier >> to follow. > > Sure. Sorry for that I forgot to add that. > >> >> Also, sorry for starting with a meta discussion about reviews, but >> there are multiple types of review comment, including: >> >> (1) Suggestions for changes that are worded as suggestions. >> >> (2) Suggestions for changes that are worded as questions ("Wouldn't it be >> better to do X?", etc). >> >> (3) Questions asking for an explanation or for more information. >> >> Just sending a new patch makes sense when the previous review comments >> were all like (1), and arguably also (1)+(2). But Alex's previous review >> included (3) as well. Could you go back and respond to his questions there? >> It would help understand some of the design choices. >> > > I have responded to Alex comments for the previous patches. > I have incorporated all of his comments in this patch. > > >> A natural starting point when reviewing a patch like this is to diff >> the current aarch64-ldp-fusion.cc with the new pair-fusion.cc. This shows >> many of the kind of changes that I'd expect. But it also seems to include >> some code reordering, such as putting fuse_pair after try_fuse_pair. >> If some reordering is necessary, could you try to organise the patch as >> a series in which the reordering is a separate step? It's a bit hard >> to review at the moment. (Reordering for cosmetic reasons is also OK, >> but again please separate it out for ease of review.) >> >> Maybe one way of making the review easier would be to split the aarch64 >> pass into the "target-dependent" and "target-independent" pieces >> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then >> (as separate patches) move the target-independent pieces outside >> config/aarch64. >> > Sure I will do that. > >> The patch includes: >> >>> * emit-rtl.cc: Modify ge with gt on PolyINT data structure. >>> * dce.cc: Add changes not to delete the load store pair. >>> * rtl-ssa/changes.cc: Modified assert code. >>> * var-tracking.cc: Modified assert code. >>> * df-problems.cc: Not to generate REG_UNUSED for multi >>> word registers that is requied for rs6000 target. >> >> Please submit these separately, as independent preparatory patches, >> with an explanation for why they're needed & correct. But: >> > Sure I will do that. > >>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc >>> index 88ee0dd67fc..a8d0ee7c4db 100644 >>> --- a/gcc/df-problems.cc >>> +++ b/gcc/df-problems.cc >>> @@ -3360,7 +3360,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct >>> df_mw_hardreg *mws, >>> if (df_whole_mw_reg_unused_p (mws, live, artificial_uses)) >>> { >>> unsigned int regno = mws->start_regno; >>> - df_set_note (REG_UNUSED, insn, mws->mw_reg); >>> + //df_set_note (REG_UNUSED, insn, mws->mw_reg); >>> dead_debug_insert_temp (debug, regno, insn, >>> DEBUG_TEMP_AFTER_WITH_REG); >>> >>> if (REG_DEAD_DEBUGGING) >>> @@ -3375,7 +3375,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct >>> df_mw_hardreg *mws, >>> if (!bitmap_bit_p (live, r) >>> && !bitmap_bit_p (artificial_uses, r)) >>> { >>> - df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]); >>> + // df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]); I just want to emphasise here: a) adding out commented code is very unusual (I know a reviewer picked up on that already); b) if you are going to comment something out as a hack / you need help, please *clearly flag that* (apologies if I missed it), and possibly add a comment above it saying "// TODO: Need to figure out ...." or similar, otherwise it just looks like it was forgotten about. In this case, your question about how to handle REG_UNUSED should've been made clear in a summary at the top where you mention the outstanding items. Again, sorry if I missed it. >>> dead_debug_insert_temp (debug, r, insn, DEBUG_TEMP_AFTER_WITH_REG); >>> if (REG_DEAD_DEBUGGING) >>> df_print_note ("adding 2: ", insn, REG_NOTES (insn)); >>> @@ -3493,9 +3493,9 @@ df_create_unused_note (rtx_insn *insn, df_ref def, >>> || bitmap_bit_p (artificial_uses, dregno) >>> || df_ignore_stack_reg (dregno))) >>> { >>> - rtx reg = (DF_REF_LOC (def)) >>> - ? *DF_REF_REAL_LOC (def): DF_REF_REG (def); >>> - df_set_note (REG_UNUSED, insn, reg); >>> + //rtx reg = (DF_REF_LOC (def)) >>> + // ? *DF_REF_REAL_LOC (def): DF_REF_REG (def); >>> + //df_set_note (REG_UNUSED, insn, reg); >>> dead_debug_insert_temp (debug, dregno, insn, >>> DEBUG_TEMP_AFTER_WITH_REG); >>> if (REG_DEAD_DEBUGGING) >>> df_print_note ("adding 3: ", insn, REG_NOTES (insn)); >> >> I don't think this can be right. The last hunk of the var-tracking.cc >> patch also seems to be reverting a correct change. >> > > We generate sequential registers using (subreg V16QI (reg 00mode) 16) > and (reg OOmode 0) > where OOmode is 256 bit and V16QI is 128 bits in order to generate > sequential register pair. If I keep the above REG_UNUSED notes ira generates > REG_UNUSED and in cprop_harreg pass and dce pass deletes store pairs and > we get incorrect code. > > By commenting REG_UNUSED notes it is not generated and we get the correct > store > pair fusion and cprop_hardreg and dce doesn't deletes them. > > Please let me know is there are better ways to address this instead of > commenting > above generation of REG_UNUSED notes. > > Thanks & Regards > Ajit >> Thanks, >> Richard