Hello Richard: On 06/06/24 2:28 pm, Richard Sandiford wrote: > Hi, > > Just some comments on the fuseable_load_p part, since that's what > we were discussing last time. > > It looks like this now relies on: > > Ajit Agarwal <aagar...@linux.ibm.com> writes: >> + /* We use DF data flow because we change location rtx >> + which is easier to find and modify. >> + We use mix of rtl-ssa def-use and DF data flow >> + where it is easier. */ >> + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); >> + df_analyze (); >> + df_set_flags (DF_DEFER_INSN_RESCAN); > > But please don't do this! For one thing, building DU/UD chains > as well as rtl-ssa is really expensive in terms of compile time. > But more importantly, modifications need to happen via rtl-ssa > to ensure that the IL is kept up-to-date. If we don't do that, > later fuse attempts will be based on stale data and so could > generate incorrect code. >
Sure I have made changes to use only rtl-ssa and not to use UD/DU chains. I will send the changes in separate subsequent patch. >> +// Check whether load can be fusable or not. >> +// Return true if fuseable otherwise false. >> +bool >> +rs6000_pair_fusion::fuseable_load_p (insn_info *info) >> +{ >> + for (auto def : info->defs()) >> + { >> + auto set = dyn_cast<set_info *> (def); >> + for (auto use1 : set->nondebug_insn_uses ()) >> + use1->set_is_live_out_use (true); >> + } > > What was the reason for adding this loop? > The purpose of adding is to avoid assert failure in gcc/rtl-ssa/changes.cc:252 >> + >> + rtx_insn *rtl_insn = info ->rtl (); >> + rtx body = PATTERN (rtl_insn); >> + rtx dest_exp = SET_DEST (body); >> + >> + if (REG_P (dest_exp) && >> + (DF_REG_DEF_COUNT (REGNO (dest_exp)) > 1 > > The rtl-ssa way of checking this is: > > crtl->ssa->is_single_dominating_def (...) > >> + || DF_REG_EQ_USE_COUNT (REGNO (dest_exp)) > 0)) >> + return false; > > Why are uses in notes a problem? In the worst case, we should just be > able to remove the note instead. > We can remove this and its no more required. I will make this change in subsequent patches. >> + >> + rtx addr = XEXP (SET_SRC (body), 0); >> + >> + if (GET_CODE (addr) == PLUS >> + && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1))) >> + { >> + if (INTVAL (XEXP (addr, 1)) == -16) >> + return false; >> + } > > What's special about -16? > The tests like libgomp/for-8 fails with fused load with offset -16 and 0. Thats why I have added this check. >> + >> + df_ref use; >> + df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ()); >> + FOR_EACH_INSN_INFO_DEF (use, insn_info) >> + { >> + struct df_link *def_link = DF_REF_CHAIN (use); >> + >> + if (!def_link || !def_link->ref >> + || DF_REF_IS_ARTIFICIAL (def_link->ref)) >> + continue; >> + >> + while (def_link && def_link->ref) >> + { >> + rtx_insn *insn = DF_REF_INSN (def_link->ref); >> + if (GET_CODE (PATTERN (insn)) == PARALLEL) >> + return false; > > Why do you need to skip PARALLELs? > vec_select with parallel give failures final.cc "can't split-up with subreg 128 (reg OO" Thats why I have added this. >> + >> + rtx set = single_set (insn); >> + if (set == NULL_RTX) >> + return false; >> + >> + rtx op0 = SET_SRC (set); >> + rtx_code code = GET_CODE (op0); >> + >> + // This check is added as register pairs are not generated >> + // by RA for neg:V2DF (fma: V2DF (reg1) >> + // (reg2) >> + // (neg:V2DF (reg3))) >> + if (GET_RTX_CLASS (code) == RTX_UNARY) >> + return false; > > What's special about (neg (fma ...))? > I am not sure why register allocator fails allocating register pairs with NEG Unary operation with fma operand. I have not debugged register allocator why the NEG Unary operation with fma operand. >> + >> + def_link = def_link->next; >> + } >> + } >> + return true; >> +} > > Thanks, > Richard Thanks & Regards Ajit