On 18/10/2024 17:45, Richard Sandiford wrote: > Alex Coplan <alex.cop...@arm.com> writes: > > On 11/10/2024 14:30, Richard Biener wrote: > >> On Fri, 11 Oct 2024, Richard Sandiford wrote: > >> > >> > Alex Coplan <alex.cop...@arm.com> writes: > >> > > Hi, > >> > > > >> > > As the PR shows, pair-fusion was tricking memory_modified_in_insn_p > >> > > into > >> > > returning false when a common base register (in this case, x1) was > >> > > modified between the mem and the store insn. This lead to wrong code > >> > > as > >> > > the accesses really did alias. > >> > > > >> > > To avoid this sort of problem, this patch avoids invoking RTL alias > >> > > analysis altogether (and assume an alias conflict) if the two insns to > >> > > be compared share a common address register R, and the insns see > >> > > different > >> > > definitions of R (i.e. it was modified in between). > >> > > > >> > > Bootstrapped/regtested on aarch64-linux-gnu (all languages, both > >> > > regular > >> > > bootstrap and LTO+PGO bootstrap). OK for trunk? > >> > > >> > Sorry for the slow review. The patch looks good to me, but... > > > > Thanks for the review. I'd missed that you'd sent this, sorry for not > > responding sooner. > > > >> > > >> > > @@ -2544,11 +2624,37 @@ pair_fusion_bb_info::try_fuse_pair (bool > >> > > load_p, unsigned access_size, > >> > > && bitmap_bit_p (&m_tombstone_bitmap, insn->uid ()); > >> > > }; > >> > > > >> > > + // Maximum number of distinct regnos we expect to appear in a single > >> > > + // MEM (and thus in a candidate insn). > >> > > + static constexpr int max_mem_regs = 2; > >> > > + auto_vec<access_info *, max_mem_regs> addr_use_vec[2]; > >> > > + use_array addr_uses[2]; > >> > > + > >> > > + // Collect the lists of register uses that occur in the candidate > >> > > MEMs. > >> > > + for (int i = 0; i < 2; i++) > >> > > + { > >> > > + // N.B. it's safe for us to ignore uses that only occur in notes > >> > > + // here (e.g. in a REG_EQUIV expression) since we only pass the > >> > > + // MEM down to the alias machinery, so it can't see any > >> > > insn-level > >> > > + // notes. > >> > > + for (auto use : insns[i]->uses ()) > >> > > + if (use->is_reg () > >> > > + && use->includes_address_uses () > >> > > + && !use->only_occurs_in_notes ()) > >> > > + { > >> > > + gcc_checking_assert (addr_use_vec[i].length () < > >> > > max_mem_regs); > >> > > + addr_use_vec[i].quick_push (use); > >> > > >> > ...if possible, I think it would be better to just use safe_push here, > >> > without the assert. There'd then be no need to split max_mem_regs out; > >> > it could just be hard-coded in the addr_use_vec declaration. > > > > I hadn't realised at the time that quick_push () already does a > > gcc_checking_assert to make sure that we don't overflow. It does: > > > > template<typename T, typename A> > > inline T * > > vec<T, A, vl_embed>::quick_push (const T &obj) > > { > > gcc_checking_assert (space (1)); > > T *slot = &address ()[m_vecpfx.m_num++]; > > ::new (static_cast<void*>(slot)) T (obj); > > return slot; > > } > > > > (I checked the behaviour by writing a quick selftest in vec.cc, and it > > indeed aborts as expected with quick_push on overflow for a > > stack-allocated auto_vec with N = 2.) > > > > This means that the assert above is indeed redundant, so I agree that > > we should be able to drop the assert and drop the max_mem_regs constant, > > using a literal inside the auto_vec template instead (all while still > > using quick_push). > > > > Does that sound OK to you, or did you have another reason to prefer > > safe_push? AIUI the behaviour of safe_push on overflow would be to > > allocate a new (heap-allocated) vector instead of asserting. > > I just thought it looked odd/unexpected. Normally the intent of: > > auto_vec<foo, 16> bar; > > is to reserve a sensible amount of stack space for the common case, > but still support the general case of arbitrarily many elements. > The common on-stack case will be fast with both quick_push and > safe_push[*]. The difference is just whether growing beyond the > static space would abort the compiler or work as expected. > > quick_push makes sense if an earlier loop has calculated the runtime > length of the vector and if we've already reserved that amount, or if > there is a static property that guarantees a static limit. But the limit > of 2 looked more like a general assumption, rather than something that > had been definitively checked by earlier code. > > I was also wondering whether using safe_push on an array of auto_vecs > caused issues, and so you were having to work around that. (I remember > sometimes hitting a warning about attempts to delete an on-stack buffer, > presumably due to code duplication creating contradictory paths that > jump threading couldn't optimise away as dead.) > > No real objection though. Just wanted to clarify what I meant. :)
I see. For development I wanted to validate empirically what this limit was (at least for aarch64, currently the only target which makes use of pair-fusion), and I wanted to find out if there were cases where we exceeded it (initially this made me realise that we were accidentally including uses in notes, which was useful). But perhaps for the trunk using safe_push is better. If there happens to be some case where more than two regs show up in a MEM with a released compiler, then safe_push would work as expected rather than causing a buffer overflow. I'll make that change. Alex > > Thanks, > Richard > > [*] well, ok, quick_push will be slightly faster in release builds, > since quick_push won't do a bounds check in that case. But the > check in safe_push would be highly predictable.