Alex Coplan <alex.cop...@arm.com> writes: > On 11/10/2024 14:30, Richard Biener wrote: >> On Fri, 11 Oct 2024, Richard Sandiford wrote: >> >> > Alex Coplan <alex.cop...@arm.com> writes: >> > > Hi, >> > > >> > > As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into >> > > returning false when a common base register (in this case, x1) was >> > > modified between the mem and the store insn. This lead to wrong code as >> > > the accesses really did alias. >> > > >> > > To avoid this sort of problem, this patch avoids invoking RTL alias >> > > analysis altogether (and assume an alias conflict) if the two insns to >> > > be compared share a common address register R, and the insns see >> > > different >> > > definitions of R (i.e. it was modified in between). >> > > >> > > Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular >> > > bootstrap and LTO+PGO bootstrap). OK for trunk? >> > >> > Sorry for the slow review. The patch looks good to me, but... > > Thanks for the review. I'd missed that you'd sent this, sorry for not > responding sooner. > >> > >> > > @@ -2544,11 +2624,37 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, >> > > unsigned access_size, >> > > && bitmap_bit_p (&m_tombstone_bitmap, insn->uid ()); >> > > }; >> > > >> > > + // Maximum number of distinct regnos we expect to appear in a single >> > > + // MEM (and thus in a candidate insn). >> > > + static constexpr int max_mem_regs = 2; >> > > + auto_vec<access_info *, max_mem_regs> addr_use_vec[2]; >> > > + use_array addr_uses[2]; >> > > + >> > > + // Collect the lists of register uses that occur in the candidate >> > > MEMs. >> > > + for (int i = 0; i < 2; i++) >> > > + { >> > > + // N.B. it's safe for us to ignore uses that only occur in notes >> > > + // here (e.g. in a REG_EQUIV expression) since we only pass the >> > > + // MEM down to the alias machinery, so it can't see any insn-level >> > > + // notes. >> > > + for (auto use : insns[i]->uses ()) >> > > + if (use->is_reg () >> > > + && use->includes_address_uses () >> > > + && !use->only_occurs_in_notes ()) >> > > + { >> > > + gcc_checking_assert (addr_use_vec[i].length () < >> > > max_mem_regs); >> > > + addr_use_vec[i].quick_push (use); >> > >> > ...if possible, I think it would be better to just use safe_push here, >> > without the assert. There'd then be no need to split max_mem_regs out; >> > it could just be hard-coded in the addr_use_vec declaration. > > I hadn't realised at the time that quick_push () already does a > gcc_checking_assert to make sure that we don't overflow. It does: > > template<typename T, typename A> > inline T * > vec<T, A, vl_embed>::quick_push (const T &obj) > { > gcc_checking_assert (space (1)); > T *slot = &address ()[m_vecpfx.m_num++]; > ::new (static_cast<void*>(slot)) T (obj); > return slot; > } > > (I checked the behaviour by writing a quick selftest in vec.cc, and it > indeed aborts as expected with quick_push on overflow for a > stack-allocated auto_vec with N = 2.) > > This means that the assert above is indeed redundant, so I agree that > we should be able to drop the assert and drop the max_mem_regs constant, > using a literal inside the auto_vec template instead (all while still > using quick_push). > > Does that sound OK to you, or did you have another reason to prefer > safe_push? AIUI the behaviour of safe_push on overflow would be to > allocate a new (heap-allocated) vector instead of asserting.
I just thought it looked odd/unexpected. Normally the intent of: auto_vec<foo, 16> bar; is to reserve a sensible amount of stack space for the common case, but still support the general case of arbitrarily many elements. The common on-stack case will be fast with both quick_push and safe_push[*]. The difference is just whether growing beyond the static space would abort the compiler or work as expected. quick_push makes sense if an earlier loop has calculated the runtime length of the vector and if we've already reserved that amount, or if there is a static property that guarantees a static limit. But the limit of 2 looked more like a general assumption, rather than something that had been definitively checked by earlier code. I was also wondering whether using safe_push on an array of auto_vecs caused issues, and so you were having to work around that. (I remember sometimes hitting a warning about attempts to delete an on-stack buffer, presumably due to code duplication creating contradictory paths that jump threading couldn't optimise away as dead.) No real objection though. Just wanted to clarify what I meant. :) Thanks, Richard [*] well, ok, quick_push will be slightly faster in release builds, since quick_push won't do a bounds check in that case. But the check in safe_push would be highly predictable.