Hello Alex: On 12/04/24 11:02 pm, Ajit Agarwal wrote: > Hello Alex: > > On 12/04/24 8:15 pm, Alex Coplan wrote: >> On 12/04/2024 20:02, Ajit Agarwal wrote: >>> Hello Alex: >>> >>> On 11/04/24 7:55 pm, Alex Coplan wrote: >>>> On 10/04/2024 23:48, Ajit Agarwal wrote: >>>>> Hello Alex: >>>>> >>>>> On 10/04/24 7:52 pm, Alex Coplan wrote: >>>>>> Hi Ajit, >>>>>> >>>>>> On 10/04/2024 15:31, Ajit Agarwal wrote: >>>>>>> Hello Alex: >>>>>>> >>>>>>> On 10/04/24 1:42 pm, Alex Coplan wrote: >>>>>>>> Hi Ajit, >>>>>>>> >>>>>>>> On 09/04/2024 20:59, Ajit Agarwal wrote: >>>>>>>>> Hello Alex: >>>>>>>>> >>>>>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote: >>>>>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote: >>>>>>>>>>> Hello Alex: >>>>>>>>>>> >>>>>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote: >>>>>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote: >>>>>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote: >>>>>>>>>>>>>>> Hello Alex/Richard: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> All review comments are incorporated. >>>>>> <snip> >>>>>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t >>>>>>>>>>>>>>> &left_list, >>>>>>>>>>>>>>> // of accesses. If we find two sets of adjacent accesses, call >>>>>>>>>>>>>>> // merge_pairs. >>>>>>>>>>>>>>> void >>>>>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>>>>> - access_group &group) >>>>>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>>>>> + access_group &group) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> const auto lfs = decode_lfs (encoded_lfs); >>>>>>>>>>>>>>> const unsigned access_size = lfs.size; >>>>>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int >>>>>>>>>>>>>>> encoded_lfs, >>>>>>>>>>>>>>> access.cand_insns, >>>>>>>>>>>>>>> lfs.load_p, >>>>>>>>>>>>>>> access_size); >>>>>>>>>>>>>>> - skip_next = access.cand_insns.empty (); >>>>>>>>>>>>>>> + skip_next = bb_state->cand_insns_empty_p >>>>>>>>>>>>>>> (access.cand_insns); >>>>>>>>>>>>>> >>>>>>>>>>>>>> As above, why is this needed? >>>>>>>>>>>>> >>>>>>>>>>>>> For rs6000 we want to return always true. as load store pair >>>>>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for >>>>>>>>>>>>> rs6000. >>>>>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want >>>>>>>>>>>>> to generate always true for rs6000 to skip pairs as above. >>>>>>>>>>>> >>>>>>>>>>>> Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 >>>>>>>>>>>> you have >>>>>>>>>>>> load/store pair instructions where the two arms of the access are >>>>>>>>>>>> storing >>>>>>>>>>>> operands of different sizes? Or something else? >>>>>>>>>>>> >>>>>>>>>>>> As it stands the logic is to skip the next iteration only if we >>>>>>>>>>>> exhausted all the candidate insns for the current access. In the >>>>>>>>>>>> case >>>>>>>>>>>> that we didn't exhaust all such candidates, then the idea is that >>>>>>>>>>>> when >>>>>>>>>>>> access becomes prev_access, we can attempt to use those candidates >>>>>>>>>>>> as >>>>>>>>>>>> the "left-hand side" of a pair in the next iteration since we >>>>>>>>>>>> failed to >>>>>>>>>>>> use them as the "right-hand side" of a pair in the current >>>>>>>>>>>> iteration. >>>>>>>>>>>> I don't see why you wouldn't want that behaviour. Please can you >>>>>>>>>>>> explain? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and >>>>>>>>>>> other load is from 16th offset. Then in next iteration we get load >>>>>>>>>>> from 16th offset and other load from 32 offset. In next iteration >>>>>>>>>>> we get load from 32 offset and other load from 48 offset. >>>>>>>>>>> >>>>>>>>>>> For example: >>>>>>>>>>> >>>>>>>>>>> Currently we get the load candiates as follows. >>>>>>>>>>> >>>>>>>>>>> pairs: >>>>>>>>>>> >>>>>>>>>>> load from 0th offset. >>>>>>>>>>> load from 16th offset. >>>>>>>>>>> >>>>>>>>>>> next pairs: >>>>>>>>>>> >>>>>>>>>>> load from 16th offset. >>>>>>>>>>> load from 32th offset. >>>>>>>>>>> >>>>>>>>>>> next pairs: >>>>>>>>>>> >>>>>>>>>>> load from 32th offset >>>>>>>>>>> load from 48th offset. >>>>>>>>>>> >>>>>>>>>>> Instead in rs6000 we should get: >>>>>>>>>>> >>>>>>>>>>> pairs: >>>>>>>>>>> >>>>>>>>>>> load from 0th offset >>>>>>>>>>> load from 16th offset. >>>>>>>>>>> >>>>>>>>>>> next pairs: >>>>>>>>>>> >>>>>>>>>>> load from 32th offset >>>>>>>>>>> load from 48th offset. >>>>>>>>>> >>>>>>>>>> Hmm, so then I guess my question is: why wouldn't you consider >>>>>>>>>> merging >>>>>>>>>> the pair with offsets (16,32) for rs6000? Is it because you have a >>>>>>>>>> stricter alignment requirement on the base pair offsets (i.e. they >>>>>>>>>> have >>>>>>>>>> to be a multiple of 32 when the operand size is 16)? So the pair >>>>>>>>>> offsets have to be a multiple of the entire pair size rather than a >>>>>>>>>> single operand size> >>>>>>>>> >>>>>>>>> We get load pair at a certain point with (0,16) and other program >>>>>>>>> point we get load pair (32, 48). >>>>>>>>> >>>>>>>>> In current implementation it takes offsets loads as (0, 16), >>>>>>>>> (16, 32), (32, 48). >>>>>>>>> >>>>>>>>> But In rs6000 we want the load pair to be merged at different points >>>>>>>>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with >>>>>>>>> 0 offset and other load (32, 48) with lxvp with 32 offset. >>>>>>>>> >>>>>>>>> In current case it will merge with lxvp with 0 offset and lxvp with >>>>>>>>> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which >>>>>>>>> is incorrect in our case as the (16-32) case 16 offset will not >>>>>>>>> load from even register and break for rs6000. >>>>>>>> >>>>>>>> Sorry, I think I'm still missing something here. Why does the address >>>>>>>> offset >>>>>>>> affect the parity of the tranfser register? ISTM they needn't be >>>>>>>> related at >>>>>>>> all (and indeed we can't even know the parity of the transfer register >>>>>>>> before >>>>>>>> RA, but perhaps you're only intending on running the pass after RA?) >>>>>>>> >>>>>>> >>>>>>> We have load pair with (0,16) wherein these loads are adjacent and >>>>>>> replaced with lxvp. >>>>>>> >>>>>>> Semantic of lxvp instruction is that it loads adjacent load pair in >>>>>>> even register and even_register + 1. >>>>>>> >>>>>>> We replace the above load pair with lxvp instruction and then we >>>>>>> dont need to merge (16,32) as (0, 16) is already merged and instead >>>>>>> we merge (32,48). >>>>>> >>>>>> Ok, but the existing logic should already account for this. I.e. if we >>>>>> successfully merge (0,16), then we don't attempt to merge (16,32). We'd >>>>>> only >>>>>> attempt to merge (16,32) if the merge of (0,16) failed (for whatever >>>>>> reason). >>>>>> So I don't see that there's anything specific to lxvp that requires this >>>>>> logic >>>>>> to change, _unless_ you have a stricter alignment requirement on the >>>>>> offsets as >>>>>> I mentioned before. >>>>>> >>>>> >>>>> Thanks for the suggestion. It worked for rs6000 also with current changes. >>>>> Sorry for the confusion. >>>> >>>> Alright, glad we got to the bottom of this! >>> >>> Thanks. >>>> >>>>> >>>>>>> >>>>>>> Yes you are correct, the addresss offset doesn't affect the parity of >>>>>>> the register transfer as we are doing fusion pass before RA. >>>>>>> If that is the case then I think it would be better to introduce a >>>>>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base >>>>>>>>>> offset of the pair (prev_access->offset in transform_for_base). I >>>>>>>>>> guess >>>>>>>>>> it would also take access_size as a parameter and for aarch64 it >>>>>>>>>> should >>>>>>>>>> check: >>>>>>>>>> >>>>>>>>>> multiple_p (offset, access_size) >>>>>>>>>> >>>>>>>>>> and for rs6000 it could check: >>>>>>>>>> >>>>>>>>>> multiple_p (offset, access_size * 2) >>>>>>>>>> >>>>>>>>>> and we would then incorporate a call to that predicate in the else if >>>>>>>>>> condition of tranform_for_base. >>>>>>>>>> >>>>>>>>>> It would have the same limitation whereby we assume that MEM_EXPR >>>>>>>>>> offset >>>>>>>>>> alignment is a good proxy for RTL offset alignment, but we already >>>>>>>>>> depend on that with the multiple_p check in track_via_mem_expr. >>>>>>>>>> >>>>>>> I have addressed the above hooks and it worked fine with both rs6000 >>>>>>> and aarch64. I am sending subsequent patch in some time that address >>>>>>> above. >>>>>>> >>>>>>>> How do you plan on handling this even-odd requirement for rs6000? >>>>>>>> >>>>>>> >>>>>>> We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to >>>>>>> generate register pair and thats what we generate and implement >>>>>>> in rs6000 target >>>>>>> code. >>>>>> >>>>>> Ah, this is coming back to me now. Sorry, I should have remembered this >>>>>> from >>>>>> the previous discussion with Richard S. >>>>>> >>>>>> Apologies for going on a slight tangent here, but if you're running >>>>>> before RA are you planning to create a new OImode pseudo register for >>>>>> the lxvp insn and then somehow update uses of the old transfer registers >>>>>> to replace them with subregs of that OImode pseudo? >>>>> >>>>> Yes I do the same as you have mentioned. We generate register pairs >>>>> with 256 bit mode with two subregs of 128 bit modes with 0 and >>>>> 16 offset. >>>>> >>>>> Or do you just plan >>>>>> or replacing the individual loads with moves (instead of deleting them)? >>>>>> I guess the latter would be simpler and might work better in the >>>>>> presence of hard regs. >>>>>> >>>>> >>>>> Would you mind explaining how to generate register pairs with lxvp by >>>>> replacing loads with moves. >>>> >>>> Yeah, so suppose you have something like: >>>> >>>> (set (reg:V16QI v1) (mem:V16QI addr)) >>>> (set (reg:V16QI v2) (mem:V16QI addr+16)) >>>> >>>> then when you insert the lxvp you can then (logically) replace the >>>> original load instructions with moves from the appropriate subreg, as >>>> follows: >>>> >>>> (set (reg:OI pair-pseudo) (mem:OI addr)) ; lxvp >>>> (set (reg:V16QI v1) (subreg:V16QI (reg:OI pair-pseudo) 0)) >>>> (set (reg:V16QI v2) (subreg:V16QI (reg:OI pair-pseudo) 16)) >>>> >>> >>> Any Pseudo created with gen_rtx_REG like >>> gen_rtx_REG (OOmode, REGNO (dest_exp) will error'ed out >>> with unrecognize insn by LRA. >> >> I'm not surprised that goes wrong: you can't just create a new REG >> rtx in a different mode and reuse the regno of an existing pseudo. >>
Thanks for the suggestion. >>> >>> If I create pseudo with gen_reg_rtx (OOmode) will error'ed >>> out with new_defs Pseudo register is not found in >>> change->new_defs. >> >> Yeah, I suppose you'd need to add an RTL-SSA def for the new pseudo. >> > > Would you mind explaining how can I add and RTL-SSA def for the > new pseudo. I have added and RTL-SSA def for the new pseudo. With that I could get register oairs correctly. > >>> >>> Also the sequential register pairs are not generated by >>> Register Allocator. >> >> So how do you get the sequential pairs with your current approach? My >> understanding was that what I suggested above doesn't really change what >> you're doing w.r.t the lxvp insn itself, but maybe I didn't properly >> understand the approach taken in your initial patchset. >> > > I generate (set (reg:OO pair-pseudo) (mem:OI addr)) ; lxvp > and then at the use point of pair_pseudo generate the following. > > (subreg:V16QI (reg:OI pair-pseudo) 0)) > (subreg:V16QI (reg:OI pair-pseudo) 16)) > I get register pairs correctly generating RTL as you have suggested but we get extra moves that would impact the performance. Please let me know what do you think. Thanks & Regards Ajit > Thanks & Regards > Ajit >> Thanks, >> Alex >> >>> >>> Thats why I haven't used above method as I also thought >>> through. >>> >>> Please let me know what do you think. >>> >>> Thanks & Regards >>> Ajit >>>> now I'm not sure off-hand if this exact combination of subregs and mode >>>> changes is valid, but hopefully you get the idea. The benefit of this >>>> approach is that it keeps the transformation local and is more >>>> compile-time efficient (we don't have to look through all mentions of >>>> v1/v2 and replace them with subregs). I'd hope that the RA can then >>>> clean up any redundant moves (especially in the case that v1/v2 are >>>> pseudos). >>>> >>>> That would mean that you don't need all the grubbing around with DF >>>> looking for occurrences of the transfer regs. Instead we'd just need >>>> some logic to insert the extra moves after the lxvp for rs6000, and I >>>> think this might fit more cleanly into the current pass. >>>> >>>> Does that make sense? In any case, this shouldn't really affect the >>>> preparatory aarch64 patch because we were planning to defer adding any >>>> hooks that are only needed for rs6000 from the initial aarch64/generic >>>> split. >>>> >>>> Thanks, >>>> Alex >>>> >>>>> >>>>> Thanks & Regards >>>>> Ajit >>>>> >>>>>> Thanks, >>>>>> Alex >>>>>> >>>>>>> >>>>>>> Thanks & Regards >>>>>>> Ajit >>>>>>>> Thanks, >>>>>>>> Alex >>>>>>>> >>>>>>>>> >>>>>>>>> lxvp should load from even registers and then loaded value will >>>>>>>>> be in even register and even register +1 (which is odd). >>>>>>>>> >>>>>>>>> Thanks & Regards >>>>>>>>> Ajit >>>>>>>>>> If that is the case then I think it would be better to introduce a >>>>>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base >>>>>>>>>> offset of the pair (prev_access->offset in transform_for_base). I >>>>>>>>>> guess >>>>>>>>>> it would also take access_size as a parameter and for aarch64 it >>>>>>>>>> should >>>>>>>>>> check: >>>>>>>>>> >>>>>>>>>> multiple_p (offset, access_size) >>>>>>>>>> >>>>>>>>>> and for rs6000 it could check: >>>>>>>>>> >>>>>>>>>> multiple_p (offset, access_size * 2) >>>>>>>>>> >>>>>>>>>> and we would then incorporate a call to that predicate in the else if >>>>>>>>>> condition of tranform_for_base. >>>>>>>>>> >>>>>>>>>> It would have the same limitation whereby we assume that MEM_EXPR >>>>>>>>>> offset >>>>>>>>>> alignment is a good proxy for RTL offset alignment, but we already >>>>>>>>>> depend on that with the multiple_p check in track_via_mem_expr. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Alex >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks & Regards >>>>>>>>>>> Ajit >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Alex >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> prev_access = &access; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int >>>>>>>>>>>>>>> encoded_lfs, >>>>>>>>>>>>>>> // and remove all the tombstone insns, being sure to reparent >>>>>>>>>>>>>>> any uses >>>>>>>>>>>>>>> // of mem to previous defs when we do this. >>>>>>>>>>>>>>> void >>>>>>>>>>>>>>> -ldp_bb_info::cleanup_tombstones () >>>>>>>>>>>>>>> +pair_fusion_bb_info::cleanup_tombstones () >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> // No need to do anything if we didn't emit a tombstone insn >>>>>>>>>>>>>>> for this BB. >>>>>>>>>>>>>>> if (!m_emitted_tombstone) >>>>>>>>>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> template<typename Map> >>>>>>>>>>>>>>> void >>>>>>>>>>>>>>> -ldp_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> for (auto kv : map) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> void >>>>>>>>>>>>>>> -ldp_bb_info::transform () >>>>>>>>>>>>>>> +pair_fusion_bb_info::transform () >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> traverse_base_map (expr_map); >>>>>>>>>>>>>>> traverse_base_map (def_map); >>>>>>>>>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) >>>>>>>>>>>>>>> // for load/store candidates. If running after RA, also try >>>>>>>>>>>>>>> and promote >>>>>>>>>>>>>>> // non-writeback pairs to use writeback addressing. Then try >>>>>>>>>>>>>>> to fuse >>>>>>>>>>>>>>> // candidates into pairs. >>>>>>>>>>>>>>> -void ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> - const bool track_loads >>>>>>>>>>>>>>> - = aarch64_tune_params.ldp_policy_model != >>>>>>>>>>>>>>> AARCH64_LDP_STP_POLICY_NEVER; >>>>>>>>>>>>>>> - const bool track_stores >>>>>>>>>>>>>>> - = aarch64_tune_params.stp_policy_model != >>>>>>>>>>>>>>> AARCH64_LDP_STP_POLICY_NEVER; >>>>>>>>>>>>>>> + const bool track_loads = track_load_p (); >>>>>>>>>>>>>>> + const bool track_stores = track_store_p (); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - ldp_bb_info bb_state (bb); >>>>>>>>>>>>>> >>>>>>>>>>>>>> This: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> + aarch64_pair_fusion derived; >>>>>>>>>>>>>> >>>>>>>>>>>>>> can be deleted and then: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> + pair_fusion_bb_info bb_info (bb, &derived); >>>>>>>>>>>>>> >>>>>>>>>>>>>> can just be: >>>>>>>>>>>>>> >>>>>>>>>>>>>> pair_fusion_bb_info bb_info (bb, this); >>>>>>>>>>>>>> >>>>>>>>>>>>>> (or you can pass *this if you make bb_info take a reference). >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't think there's a particular need to change the variable >>>>>>>>>>>>>> name >>>>>>>>>>>>>> (bb_state -> bb_info). I chose the former because it doens't >>>>>>>>>>>>>> clash >>>>>>>>>>>>>> with the RTL-SSA structure of the same name as the latter. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Addressed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for (auto insn : bb->nondebug_insns ()) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>>>>> continue; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rtx pat = PATTERN (rti); >>>>>>>>>>>>>>> - if (reload_completed >>>>>>>>>>>>>>> - && aarch64_ldp_writeback > 1 >>>>>>>>>>>>>>> - && GET_CODE (pat) == PARALLEL >>>>>>>>>>>>>>> - && XVECLEN (pat, 0) == 2) >>>>>>>>>>>>>>> + if (pair_mem_promote_writeback_p (pat)) >>>>>>>>>>>>>>> try_promote_writeback (insn); >>>>>>>>>>>>>> >>>>>>>>>>>>>> It looks like try_promote_writeback itself will need some >>>>>>>>>>>>>> further work >>>>>>>>>>>>>> to make it target-independent. I suppose this check: >>>>>>>>>>>>>> >>>>>>>>>>>>>> auto rti = insn->rtl (); >>>>>>>>>>>>>> const auto attr = get_attr_ldpstp (rti); >>>>>>>>>>>>>> if (attr == LDPSTP_NONE) >>>>>>>>>>>>>> return; >>>>>>>>>>>>>> >>>>>>>>>>>>>> bool load_p = (attr == LDPSTP_LDP); >>>>>>>>>>>>>> gcc_checking_assert (load_p || attr == LDPSTP_STP); >>>>>>>>>>>>>> >>>>>>>>>>>>>> will need to become part of the pair_mem_promote_writeback_p >>>>>>>>>>>>>> hook that you >>>>>>>>>>>>>> added, potentially changing it to return a boolean for load_p. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then I guess we will need hooks for destructuring the pair insn >>>>>>>>>>>>>> and >>>>>>>>>>>>>> another hook to wrap aarch64_gen_writeback_pair. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Addressed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (GET_CODE (pat) != SET) >>>>>>>>>>>>>>> continue; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (track_stores && MEM_P (XEXP (pat, 0))) >>>>>>>>>>>>>>> - bb_state.track_access (insn, false, XEXP (pat, 0)); >>>>>>>>>>>>>>> + bb_info.track_access (insn, false, XEXP (pat, 0)); >>>>>>>>>>>>>>> else if (track_loads && MEM_P (XEXP (pat, 1))) >>>>>>>>>>>>>>> - bb_state.track_access (insn, true, XEXP (pat, 1)); >>>>>>>>>>>>>>> + bb_info.track_access (insn, true, XEXP (pat, 1)); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - bb_state.transform (); >>>>>>>>>>>>>>> - bb_state.cleanup_tombstones (); >>>>>>>>>>>>>>> + bb_info.transform (); >>>>>>>>>>>>>>> + bb_info.cleanup_tombstones (); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> void ldp_fusion () >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> ldp_fusion_init (); >>>>>>>>>>>>>>> + pair_fusion *pfuse; >>>>>>>>>>>>>>> + aarch64_pair_fusion derived; >>>>>>>>>>>>>>> + pfuse = &derived; >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is indeed the one place where I think it is acceptable to >>>>>>>>>>>>>> instantiate aarch64_pair_fusion. But again there's no need to >>>>>>>>>>>>>> create a >>>>>>>>>>>>>> pointer to the parent class, just call any function you like >>>>>>>>>>>>>> directly. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Addressed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for (auto bb : crtl->ssa->bbs ()) >>>>>>>>>>>>>>> - ldp_fusion_bb (bb); >>>>>>>>>>>>>>> + pfuse->ldp_fusion_bb (bb); >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think even the code to iterate over bbs should itself be a >>>>>>>>>>>>>> member >>>>>>>>>>>>>> function of pair_fusion (say "run") and then that becomes part >>>>>>>>>>>>>> of the >>>>>>>>>>>>>> generic code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So this function would just become: >>>>>>>>>>>>>> >>>>>>>>>>>>>> aarch64_pair_fusion pass; >>>>>>>>>>>>>> pass.run (); >>>>>>>>>>>>>> >>>>>>>>>>>>>> and could be inlined into the caller. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Addressed. >>>>>>>>>>>>>> Perhaps you could also add an early return like: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (!track_loads_p () && !track_stores_p ()) >>>>>>>>>>>>>> return; >>>>>>>>>>>>>> >>>>>>>>>>>>>> in pair_fusion::run () and then remove the corresponding code >>>>>>>>>>>>>> from >>>>>>>>>>>>>> pass_ldp_fusion::gate? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Addressed. >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ldp_fusion_destroy (); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> 2.39.3 >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Alex >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks & Regards >>>>>>>>>>>>> Ajit