On Fri, Sep 29, 2023 at 10:22 PM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 9/12/23 04:13, Manolis Tsamis wrote: > > >>> + > >>> +/* Get the single reaching definition of an instruction inside a BB. > >>> + The definition is desired for REG used in INSN. > >>> + Return the definition insn or NULL if there's no definition with > >>> + the desired criteria. */ > >>> +static rtx_insn* > >>> +get_single_def_in_bb (rtx_insn *insn, rtx reg) > >>> +{ > >>> + df_ref use; > >>> + struct df_link *ref_chain, *ref_link; > >>> + > >>> + FOR_EACH_INSN_USE (use, insn) > >>> + { > >>> + if (GET_CODE (DF_REF_REG (use)) == SUBREG) > >>> + return NULL; > >>> + if (REGNO (DF_REF_REG (use)) == REGNO (reg)) > >>> + break; > >>> + } > >>> + > >>> + if (!use) > >>> + return NULL; > >>> + > >>> + ref_chain = DF_REF_CHAIN (use); > >> So what if there's two uses of REG in INSN? I don't think it's be > >> common at all, but probably better safe and reject than sorry, right? Or > >> is that case filtered out earlier? > >> > > If the REG is the same won't the definitions be the same even if that > > REG appears multiple times in INSN? > Yes. Good, so no issues here. > > > fold_offsets_1 should be able to handle the folding with multiple uses > > of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1. > > If there's no other issue here I assume we want to keep that as-is in > > order to not reduce the propagation power (Which I assume is similar > > to ree which uses the same logic). > OK. I was primarily concerned about the folding and rewriting aspects. > It probably can only show up on targets with LEA like instructions, and > even on such targets it's probably rate. > OK. > > > > >>> +/* Test if INSN is a memory load / store that can have an offset folded > >>> to it. > >>> + Return true iff INSN is such an instruction and return through > >>> MEM_OUT, > >>> + REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that > >>> is > >>> + used as a base address and the offset accordingly. > >>> + All of the out pointers may be NULL in which case they will be > >>> ignored. */ > >>> +bool > >>> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out, > >>> + HOST_WIDE_INT *offset_out) > >>> +{ > >>> + rtx set = single_set (insn); > >>> + rtx mem = NULL_RTX; > >>> + > >>> + if (set != NULL_RTX) > >>> + { > >>> + rtx src = SET_SRC (set); > >>> + rtx dest = SET_DEST (set); > >>> + > >>> + /* Don't fold when we have unspec / volatile. */ > >>> + if (GET_CODE (src) == UNSPEC > >>> + || GET_CODE (src) == UNSPEC_VOLATILE > >>> + || GET_CODE (dest) == UNSPEC > >>> + || GET_CODE (dest) == UNSPEC_VOLATILE) > >>> + return false; > >>> + > >>> + if (MEM_P (src)) > >>> + mem = src; > >>> + else if (MEM_P (dest)) > >>> + mem = dest; > >>> + else if ((GET_CODE (src) == SIGN_EXTEND > >>> + || GET_CODE (src) == ZERO_EXTEND) > >>> + && MEM_P (XEXP (src, 0))) > >> Note some architectures allow both a source and destination memory. It > >> looks like your code will prefer the source operand in that case. > >> That's fine, just pointing it out. > >> > > Thanks for pointing that possibility out. I thought for a moment that > > this would be a bug with multiple mentions of the address register. > > but it should be fine due to: > > /* Special case: A foldable memory store is not foldable if it > > mentions DEST outside of the address calculation. */ > ACK. > > The other thing I keep pondering is autoincrement style addressing. > Though I think at some point I convinced myself they weren't a problem. > I think your checks only allow specific kinds of expressions for the > memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the > list of valid ops. > Yes, although I haven't considered pre/post-inc/dec, they're indeed not allowed due to what is allowed to be a root memory operation (get_fold_mem_root). I believe these shouldn't be an issue as anything that transitively affects even one of these will be rejected.
> > >>> + > >>> + int max_iters = 5; > >>> + for (int i = 0; i < max_iters; i++) > >>> + { > >>> + bool made_changes = false; > >>> + for (fold_info_map::iterator iter = fold_info->begin (); > >>> + iter != fold_info->end (); ++iter) > >>> + { > >>> + fold_mem_info *info = (*iter).second; > >>> + if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns)) > >>> + made_changes |= bitmap_ior_into (&cannot_fold_insns, > >>> + info->fold_insns); > >>> + } > >>> + > >>> + if (!made_changes) > >>> + return true; > >>> + } > >>> + > >>> + return false; > >> So how was the magic value of "5" determined here? In general we try > >> not to have magic #s like that and instead find a better way to control > >> iterations, falling back to a PARAM when all else fails. > >> > > It's indeed just a magic value :) > > In general even two iterations of this should be quite rare. One > > iteration will handle the majority of interesting cases. I chose 5 to > > limit the worst case execution time for degenerate cases. > > > > I'll be happy to change that to something better but I couldn't > > figure out how "falling back to a PARAM when all else fails" should > > work here. Is there a similar example somewhere else in the code that > > I can look at? > If we expect two iterations to be quite rare, then I doubt a param is > really worth the effort. I might suggest using a loop like > > for (pass = 0; pass < flag_expensive_optimizations + 1; pass++) > > No magic numbers :-) We use similar loops elsewhere for this kind of > scenario. > Oh I see, thanks for noting that. I initially added a param, but this is much better as the param wasn't useful in a meaningful way. Done. > > If you ever need to create a param, the procedure is to add the option > to params.opt. If you look in that file you'll see that you define a > variable name in the specification that you can then reference. > > So for a simple example look for param_avg_loop_niter. > So, V6 with the formatting fixes and the iteration count change is here: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631839.html Thanks for all the feedback! Manolis > Jeff