Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

Alex Coplan Fri, 12 Apr 2024 07:46:12 -0700

On 12/04/2024 20:02, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 11/04/24 7:55 pm, Alex Coplan wrote:
> > On 10/04/2024 23:48, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 7:52 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 10/04/2024 15:31, Ajit Agarwal wrote:
> >>>> Hello Alex:
> >>>>
> >>>> On 10/04/24 1:42 pm, Alex Coplan wrote:
> >>>>> Hi Ajit,
> >>>>>
> >>>>> On 09/04/2024 20:59, Ajit Agarwal wrote:
> >>>>>> Hello Alex:
> >>>>>>
> >>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
> >>>>>>>> Hello Alex:
> >>>>>>>>
> >>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>>>>>>>>>> Hello Alex/Richard:
> >>>>>>>>>>>>
> >>>>>>>>>>>> All review comments are incorporated.
> >>> <snip>
> >>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
> >>>>>>>>>>>> &left_list,
> >>>>>>>>>>>>  // of accesses.  If we find two sets of adjacent accesses, call
> >>>>>>>>>>>>  // merge_pairs.
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>> -                                 access_group &group)
> >>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>> +                                         access_group &group)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    const auto lfs = decode_lfs (encoded_lfs);
> >>>>>>>>>>>>    const unsigned access_size = lfs.size;
> >>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
> >>>>>>>>>>>> encoded_lfs,
> >>>>>>>>>>>>                         access.cand_insns,
> >>>>>>>>>>>>                         lfs.load_p,
> >>>>>>>>>>>>                         access_size);
> >>>>>>>>>>>> -          skip_next = access.cand_insns.empty ();
> >>>>>>>>>>>> +          skip_next = bb_state->cand_insns_empty_p 
> >>>>>>>>>>>> (access.cand_insns);
> >>>>>>>>>>>
> >>>>>>>>>>> As above, why is this needed?
> >>>>>>>>>>
> >>>>>>>>>> For rs6000 we want to return always true. as load store pair
> >>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want
> >>>>>>>>>> to generate always true for rs6000 to skip pairs as above.
> >>>>>>>>>
> >>>>>>>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> >>>>>>>>> you have
> >>>>>>>>> load/store pair instructions where the two arms of the access are 
> >>>>>>>>> storing
> >>>>>>>>> operands of different sizes?  Or something else?
> >>>>>>>>>
> >>>>>>>>> As it stands the logic is to skip the next iteration only if we
> >>>>>>>>> exhausted all the candidate insns for the current access.  In the 
> >>>>>>>>> case
> >>>>>>>>> that we didn't exhaust all such candidates, then the idea is that 
> >>>>>>>>> when
> >>>>>>>>> access becomes prev_access, we can attempt to use those candidates 
> >>>>>>>>> as
> >>>>>>>>> the "left-hand side" of a pair in the next iteration since we 
> >>>>>>>>> failed to
> >>>>>>>>> use them as the "right-hand side" of a pair in the current 
> >>>>>>>>> iteration.
> >>>>>>>>> I don't see why you wouldn't want that behaviour.  Please can you
> >>>>>>>>> explain?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and
> >>>>>>>> other load is from 16th offset. Then in next iteration we get load
> >>>>>>>> from 16th offset and other load from 32 offset. In next iteration
> >>>>>>>> we get load from 32 offset and other load from 48 offset.
> >>>>>>>>
> >>>>>>>> For example:
> >>>>>>>>
> >>>>>>>> Currently we get the load candiates as follows.
> >>>>>>>>
> >>>>>>>> pairs:
> >>>>>>>>
> >>>>>>>> load from 0th offset.
> >>>>>>>> load from 16th offset.
> >>>>>>>>
> >>>>>>>> next pairs:
> >>>>>>>>
> >>>>>>>> load from 16th offset.
> >>>>>>>> load from 32th offset.
> >>>>>>>>
> >>>>>>>> next pairs:
> >>>>>>>>
> >>>>>>>> load from 32th offset
> >>>>>>>> load from 48th offset.
> >>>>>>>>
> >>>>>>>> Instead in rs6000 we should get:
> >>>>>>>>
> >>>>>>>> pairs:
> >>>>>>>>
> >>>>>>>> load from 0th offset
> >>>>>>>> load from 16th offset.
> >>>>>>>>
> >>>>>>>> next pairs:
> >>>>>>>>
> >>>>>>>> load from 32th offset
> >>>>>>>> load from 48th offset.
> >>>>>>>
> >>>>>>> Hmm, so then I guess my question is: why wouldn't you consider merging
> >>>>>>> the pair with offsets (16,32) for rs6000?  Is it because you have a
> >>>>>>> stricter alignment requirement on the base pair offsets (i.e. they 
> >>>>>>> have
> >>>>>>> to be a multiple of 32 when the operand size is 16)?  So the pair
> >>>>>>> offsets have to be a multiple of the entire pair size rather than a
> >>>>>>> single operand size> 
> >>>>>>
> >>>>>> We get load pair at a certain point with (0,16) and other program
> >>>>>> point we get load pair (32, 48).
> >>>>>>
> >>>>>> In current implementation it takes offsets loads as (0, 16),
> >>>>>> (16, 32), (32, 48).
> >>>>>>
> >>>>>> But In rs6000 we want  the load pair to be merged at different points
> >>>>>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with
> >>>>>> 0 offset and other load (32, 48) with lxvp with 32 offset.
> >>>>>>
> >>>>>> In current case it will merge with lxvp with 0 offset and lxvp with
> >>>>>> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which
> >>>>>> is incorrect in our case as the (16-32) case 16 offset will not
> >>>>>> load from even register and break for rs6000.
> >>>>>
> >>>>> Sorry, I think I'm still missing something here.  Why does the address 
> >>>>> offset
> >>>>> affect the parity of the tranfser register?  ISTM they needn't be 
> >>>>> related at
> >>>>> all (and indeed we can't even know the parity of the transfer register 
> >>>>> before
> >>>>> RA, but perhaps you're only intending on running the pass after RA?)
> >>>>>
> >>>>
> >>>> We have load pair with (0,16) wherein these loads are adjacent and
> >>>> replaced with lxvp.
> >>>>
> >>>> Semantic of lxvp instruction is that it loads adjacent load pair in
> >>>> even register and even_register + 1.
> >>>>
> >>>> We replace the above load pair with lxvp instruction and then we
> >>>> dont need to merge (16,32) as (0, 16) is already merged and instead
> >>>> we merge (32,48).
> >>>
> >>> Ok, but the existing logic should already account for this.  I.e. if we
> >>> successfully merge (0,16), then we don't attempt to merge (16,32).  We'd 
> >>> only
> >>> attempt to merge (16,32) if the merge of (0,16) failed (for whatever 
> >>> reason).
> >>> So I don't see that there's anything specific to lxvp that requires this 
> >>> logic
> >>> to change, _unless_ you have a stricter alignment requirement on the 
> >>> offsets as
> >>> I mentioned before.
> >>>
> >>
> >> Thanks for the suggestion. It worked for rs6000 also with current changes.
> >> Sorry for the confusion.
> > 
> > Alright, glad we got to the bottom of this!
> 
> Thanks.
> > 
> >>
> >>>>
> >>>> Yes you are correct, the addresss offset doesn't affect the parity of
> >>>> the register transfer as we are doing fusion pass before RA.
> >>>> If that is the case then I think it would be better to introduce a
> >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base
> >>>>>>> offset of the pair (prev_access->offset in transform_for_base).  I 
> >>>>>>> guess
> >>>>>>> it would also take access_size as a parameter and for aarch64 it 
> >>>>>>> should
> >>>>>>> check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size)
> >>>>>>>
> >>>>>>> and for rs6000 it could check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size * 2)
> >>>>>>>
> >>>>>>> and we would then incorporate a call to that predicate in the else if
> >>>>>>> condition of tranform_for_base.
> >>>>>>>
> >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR 
> >>>>>>> offset
> >>>>>>> alignment is a good proxy for RTL offset alignment, but we already
> >>>>>>> depend on that with the multiple_p check in track_via_mem_expr.
> >>>>>>>
> >>>> I have addressed the above hooks and it worked fine with both rs6000
> >>>> and aarch64. I am sending subsequent patch in some time that address
> >>>> above.
> >>>>
> >>>>> How do you plan on handling this even-odd requirement for rs6000?
> >>>>>
> >>>>
> >>>> We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to
> >>>> generate register pair and thats what we generate and implement
> >>>> in rs6000 target
> >>>> code.
> >>>
> >>> Ah, this is coming back to me now.  Sorry, I should have remembered this 
> >>> from
> >>> the previous discussion with Richard S.
> >>>
> >>> Apologies for going on a slight tangent here, but if you're running
> >>> before RA are you planning to create a new OImode pseudo register for
> >>> the lxvp insn and then somehow update uses of the old transfer registers
> >>> to replace them with subregs of that OImode pseudo?  
> >>
> >> Yes I do the same as you have mentioned. We generate register pairs
> >> with 256 bit mode with two subregs of 128 bit modes with 0 and
> >> 16 offset.
> >>
> >> Or do you just plan
> >>> or replacing the individual loads with moves (instead of deleting them)?
> >>> I guess the latter would be simpler and might work better in the
> >>> presence of hard regs.
> >>>
> >>
> >> Would you mind explaining how to generate register pairs with lxvp by
> >> replacing loads with moves.
> > 
> > Yeah, so suppose you have something like:
> > 
> > (set (reg:V16QI v1) (mem:V16QI addr))
> > (set (reg:V16QI v2) (mem:V16QI addr+16))
> > 
> > then when you insert the lxvp you can then (logically) replace the
> > original load instructions with moves from the appropriate subreg, as
> > follows:
> > 
> > (set (reg:OI pair-pseudo) (mem:OI addr)) ; lxvp
> > (set (reg:V16QI v1) (subreg:V16QI (reg:OI pair-pseudo) 0))
> > (set (reg:V16QI v2) (subreg:V16QI (reg:OI pair-pseudo) 16))
> > 
> 
> Any Pseudo created with gen_rtx_REG like 
> gen_rtx_REG (OOmode, REGNO (dest_exp) will error'ed out 
> with unrecognize insn by LRA.


I'm not surprised that goes wrong: you can't just create a new REG
rtx in a different mode and reuse the regno of an existing pseudo.

> 
> If I create pseudo with gen_reg_rtx (OOmode) will error'ed
> out with new_defs Pseudo register is not found in
> change->new_defs.

Yeah, I suppose you'd need to add an RTL-SSA def for the new pseudo.

> 
> Also the sequential register pairs are not generated by
> Register Allocator.

So how do you get the sequential pairs with your current approach?  My
understanding was that what I suggested above doesn't really change what
you're doing w.r.t the lxvp insn itself, but maybe I didn't properly
understand the approach taken in your initial patchset.

Thanks,
Alex

> 
> Thats why I haven't used above method as I also thought
> through.
> 
> Please let me know what do you think.
> 
> Thanks & Regards
> Ajit 
> > now I'm not sure off-hand if this exact combination of subregs and mode
> > changes is valid, but hopefully you get the idea.  The benefit of this
> > approach is that it keeps the transformation local and is more
> > compile-time efficient (we don't have to look through all mentions of
> > v1/v2 and replace them with subregs).  I'd hope that the RA can then
> > clean up any redundant moves (especially in the case that v1/v2 are
> > pseudos).
> > 
> > That would mean that you don't need all the grubbing around with DF
> > looking for occurrences of the transfer regs.  Instead we'd just need
> > some logic to insert the extra moves after the lxvp for rs6000, and I
> > think this might fit more cleanly into the current pass.
> > 
> > Does that make sense?  In any case, this shouldn't really affect the
> > preparatory aarch64 patch because we were planning to defer adding any
> > hooks that are only needed for rs6000 from the initial aarch64/generic
> > split.
> > 
> > Thanks,
> > Alex
> > 
> >>
> >> Thanks & Regards
> >> Ajit
> >>  
> >>> Thanks,
> >>> Alex
> >>>
> >>>>
> >>>> Thanks & Regards
> >>>> Ajit 
> >>>>> Thanks,
> >>>>> Alex
> >>>>>
> >>>>>>
> >>>>>> lxvp should load from even registers and then loaded value will
> >>>>>> be in even register and even register +1 (which is odd).
> >>>>>>
> >>>>>> Thanks & Regards
> >>>>>> Ajit
> >>>>>>> If that is the case then I think it would be better to introduce a
> >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base
> >>>>>>> offset of the pair (prev_access->offset in transform_for_base).  I 
> >>>>>>> guess
> >>>>>>> it would also take access_size as a parameter and for aarch64 it 
> >>>>>>> should
> >>>>>>> check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size)
> >>>>>>>
> >>>>>>> and for rs6000 it could check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size * 2)
> >>>>>>>
> >>>>>>> and we would then incorporate a call to that predicate in the else if
> >>>>>>> condition of tranform_for_base.
> >>>>>>>
> >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR 
> >>>>>>> offset
> >>>>>>> alignment is a good proxy for RTL offset alignment, but we already
> >>>>>>> depend on that with the multiple_p check in track_via_mem_expr.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Alex
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks & Regards
> >>>>>>>> Ajit
> >>>>>>>>> Thanks,
> >>>>>>>>> Alex
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>          }
> >>>>>>>>>>>>        prev_access = &access;
> >>>>>>>>>>>>      }
> >>>>>>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int 
> >>>>>>>>>>>> encoded_lfs,
> >>>>>>>>>>>>  // and remove all the tombstone insns, being sure to reparent 
> >>>>>>>>>>>> any uses
> >>>>>>>>>>>>  // of mem to previous defs when we do this.
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::cleanup_tombstones ()
> >>>>>>>>>>>> +pair_fusion_bb_info::cleanup_tombstones ()
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    // No need to do anything if we didn't emit a tombstone insn 
> >>>>>>>>>>>> for this BB.
> >>>>>>>>>>>>    if (!m_emitted_tombstone)
> >>>>>>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones ()
> >>>>>>>>>>>>  
> >>>>>>>>>>>>  template<typename Map>
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::traverse_base_map (Map &map)
> >>>>>>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    for (auto kv : map)
> >>>>>>>>>>>>      {
> >>>>>>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map)
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>  
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::transform ()
> >>>>>>>>>>>> +pair_fusion_bb_info::transform ()
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    traverse_base_map (expr_map);
> >>>>>>>>>>>>    traverse_base_map (def_map);
> >>>>>>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn)
> >>>>>>>>>>>>  // for load/store candidates.  If running after RA, also try 
> >>>>>>>>>>>> and promote
> >>>>>>>>>>>>  // non-writeback pairs to use writeback addressing.  Then try 
> >>>>>>>>>>>> to fuse
> >>>>>>>>>>>>  // candidates into pairs.
> >>>>>>>>>>>> -void ldp_fusion_bb (bb_info *bb)
> >>>>>>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>> -  const bool track_loads
> >>>>>>>>>>>> -    = aarch64_tune_params.ldp_policy_model != 
> >>>>>>>>>>>> AARCH64_LDP_STP_POLICY_NEVER;
> >>>>>>>>>>>> -  const bool track_stores
> >>>>>>>>>>>> -    = aarch64_tune_params.stp_policy_model != 
> >>>>>>>>>>>> AARCH64_LDP_STP_POLICY_NEVER;
> >>>>>>>>>>>> +  const bool track_loads = track_load_p ();
> >>>>>>>>>>>> +  const bool track_stores = track_store_p ();
> >>>>>>>>>>>>  
> >>>>>>>>>>>> -  ldp_bb_info bb_state (bb);
> >>>>>>>>>>>
> >>>>>>>>>>> This:
> >>>>>>>>>>>
> >>>>>>>>>>>> +  aarch64_pair_fusion derived;
> >>>>>>>>>>>
> >>>>>>>>>>> can be deleted and then:
> >>>>>>>>>>>
> >>>>>>>>>>>> +  pair_fusion_bb_info bb_info (bb, &derived);
> >>>>>>>>>>>
> >>>>>>>>>>> can just be:
> >>>>>>>>>>>
> >>>>>>>>>>>   pair_fusion_bb_info bb_info (bb, this);
> >>>>>>>>>>>
> >>>>>>>>>>> (or you can pass *this if you make bb_info take a reference).
> >>>>>>>>>>>
> >>>>>>>>>>> I don't think there's a particular need to change the variable 
> >>>>>>>>>>> name
> >>>>>>>>>>> (bb_state -> bb_info).  I chose the former because it doens't 
> >>>>>>>>>>> clash
> >>>>>>>>>>> with the RTL-SSA structure of the same name as the latter.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>>>  
> >>>>>>>>>>>>    for (auto insn : bb->nondebug_insns ())
> >>>>>>>>>>>>      {
> >>>>>>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb)
> >>>>>>>>>>>>          continue;
> >>>>>>>>>>>>  
> >>>>>>>>>>>>        rtx pat = PATTERN (rti);
> >>>>>>>>>>>> -      if (reload_completed
> >>>>>>>>>>>> -          && aarch64_ldp_writeback > 1
> >>>>>>>>>>>> -          && GET_CODE (pat) == PARALLEL
> >>>>>>>>>>>> -          && XVECLEN (pat, 0) == 2)
> >>>>>>>>>>>> +      if (pair_mem_promote_writeback_p (pat))
> >>>>>>>>>>>>          try_promote_writeback (insn);
> >>>>>>>>>>>
> >>>>>>>>>>> It looks like try_promote_writeback itself will need some further 
> >>>>>>>>>>> work
> >>>>>>>>>>> to make it target-independent.  I suppose this check:
> >>>>>>>>>>>
> >>>>>>>>>>>   auto rti = insn->rtl ();
> >>>>>>>>>>>   const auto attr = get_attr_ldpstp (rti);
> >>>>>>>>>>>   if (attr == LDPSTP_NONE)
> >>>>>>>>>>>     return;
> >>>>>>>>>>>
> >>>>>>>>>>>   bool load_p = (attr == LDPSTP_LDP);
> >>>>>>>>>>>   gcc_checking_assert (load_p || attr == LDPSTP_STP);
> >>>>>>>>>>>
> >>>>>>>>>>> will need to become part of the pair_mem_promote_writeback_p hook 
> >>>>>>>>>>> that you
> >>>>>>>>>>> added, potentially changing it to return a boolean for load_p.
> >>>>>>>>>>>
> >>>>>>>>>>> Then I guess we will need hooks for destructuring the pair insn 
> >>>>>>>>>>> and
> >>>>>>>>>>> another hook to wrap aarch64_gen_writeback_pair.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>>>  
> >>>>>>>>>>>>        if (GET_CODE (pat) != SET)
> >>>>>>>>>>>>          continue;
> >>>>>>>>>>>>  
> >>>>>>>>>>>>        if (track_stores && MEM_P (XEXP (pat, 0)))
> >>>>>>>>>>>> -        bb_state.track_access (insn, false, XEXP (pat, 0));
> >>>>>>>>>>>> +        bb_info.track_access (insn, false, XEXP (pat, 0));
> >>>>>>>>>>>>        else if (track_loads && MEM_P (XEXP (pat, 1)))
> >>>>>>>>>>>> -        bb_state.track_access (insn, true, XEXP (pat, 1));
> >>>>>>>>>>>> +        bb_info.track_access (insn, true, XEXP (pat, 1));
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>  
> >>>>>>>>>>>> -  bb_state.transform ();
> >>>>>>>>>>>> -  bb_state.cleanup_tombstones ();
> >>>>>>>>>>>> +  bb_info.transform ();
> >>>>>>>>>>>> +  bb_info.cleanup_tombstones ();
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>  
> >>>>>>>>>>>>  void ldp_fusion ()
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    ldp_fusion_init ();
> >>>>>>>>>>>> +  pair_fusion *pfuse;
> >>>>>>>>>>>> +  aarch64_pair_fusion derived;
> >>>>>>>>>>>> +  pfuse = &derived;
> >>>>>>>>>>>
> >>>>>>>>>>> This is indeed the one place where I think it is acceptable to
> >>>>>>>>>>> instantiate aarch64_pair_fusion.  But again there's no need to 
> >>>>>>>>>>> create a
> >>>>>>>>>>> pointer to the parent class, just call any function you like 
> >>>>>>>>>>> directly.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>>>  
> >>>>>>>>>>>>    for (auto bb : crtl->ssa->bbs ())
> >>>>>>>>>>>> -    ldp_fusion_bb (bb);
> >>>>>>>>>>>> +    pfuse->ldp_fusion_bb (bb);
> >>>>>>>>>>>
> >>>>>>>>>>> I think even the code to iterate over bbs should itself be a 
> >>>>>>>>>>> member
> >>>>>>>>>>> function of pair_fusion (say "run") and then that becomes part of 
> >>>>>>>>>>> the
> >>>>>>>>>>> generic code.
> >>>>>>>>>>>
> >>>>>>>>>>> So this function would just become:
> >>>>>>>>>>>
> >>>>>>>>>>> aarch64_pair_fusion pass;
> >>>>>>>>>>> pass.run ();
> >>>>>>>>>>>
> >>>>>>>>>>> and could be inlined into the caller.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>> Perhaps you could also add an early return like:
> >>>>>>>>>>>
> >>>>>>>>>>>   if (!track_loads_p () && !track_stores_p ())
> >>>>>>>>>>>     return;
> >>>>>>>>>>>
> >>>>>>>>>>> in pair_fusion::run () and then remove the corresponding code from
> >>>>>>>>>>> pass_ldp_fusion::gate?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>
> >>>>>>>>>>>>  
> >>>>>>>>>>>>    ldp_fusion_destroy ();
> >>>>>>>>>>>>  }
> >>>>>>>>>>>> -- 
> >>>>>>>>>>>> 2.39.3
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Alex
> >>>>>>>>>>
> >>>>>>>>>> Thanks & Regards
> >>>>>>>>>> Ajit

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

Reply via email to