Alex Coplan <alex.cop...@arm.com> writes:
> On 11/10/2024 14:30, Richard Biener wrote:
>> On Fri, 11 Oct 2024, Richard Sandiford wrote:
>> 
>> > Alex Coplan <alex.cop...@arm.com> writes:
>> > > Hi,
>> > >
>> > > As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
>> > > returning false when a common base register (in this case, x1) was
>> > > modified between the mem and the store insn.  This lead to wrong code as
>> > > the accesses really did alias.
>> > >
>> > > To avoid this sort of problem, this patch avoids invoking RTL alias
>> > > analysis altogether (and assume an alias conflict) if the two insns to
>> > > be compared share a common address register R, and the insns see 
>> > > different
>> > > definitions of R (i.e. it was modified in between).
>> > >
>> > > Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular
>> > > bootstrap and LTO+PGO bootstrap).  OK for trunk?
>> > 
>> > Sorry for the slow review.  The patch looks good to me, but...
>
> Thanks for the review.  I'd missed that you'd sent this, sorry for not
> responding sooner.
>
>> > 
>> > > @@ -2544,11 +2624,37 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, 
>> > > unsigned access_size,
>> > >             && bitmap_bit_p (&m_tombstone_bitmap, insn->uid ());
>> > >    };
>> > >  
>> > > +  // Maximum number of distinct regnos we expect to appear in a single
>> > > +  // MEM (and thus in a candidate insn).
>> > > +  static constexpr int max_mem_regs = 2;
>> > > +  auto_vec<access_info *, max_mem_regs> addr_use_vec[2];
>> > > +  use_array addr_uses[2];
>> > > +
>> > > +  // Collect the lists of register uses that occur in the candidate 
>> > > MEMs.
>> > > +  for (int i = 0; i < 2; i++)
>> > > +    {
>> > > +      // N.B. it's safe for us to ignore uses that only occur in notes
>> > > +      // here (e.g. in a REG_EQUIV expression) since we only pass the
>> > > +      // MEM down to the alias machinery, so it can't see any insn-level
>> > > +      // notes.
>> > > +      for (auto use : insns[i]->uses ())
>> > > +        if (use->is_reg ()
>> > > +            && use->includes_address_uses ()
>> > > +            && !use->only_occurs_in_notes ())
>> > > +          {
>> > > +            gcc_checking_assert (addr_use_vec[i].length () < 
>> > > max_mem_regs);
>> > > +            addr_use_vec[i].quick_push (use);
>> > 
>> > ...if possible, I think it would be better to just use safe_push here,
>> > without the assert.  There'd then be no need to split max_mem_regs out;
>> > it could just be hard-coded in the addr_use_vec declaration.
>
> I hadn't realised at the time that quick_push () already does a
> gcc_checking_assert to make sure that we don't overflow.  It does:
>
>   template<typename T, typename A>
>   inline T *
>   vec<T, A, vl_embed>::quick_push (const T &obj)
>   {
>     gcc_checking_assert (space (1));
>     T *slot = &address ()[m_vecpfx.m_num++];
>     ::new (static_cast<void*>(slot)) T (obj);
>     return slot;
>   }
>
> (I checked the behaviour by writing a quick selftest in vec.cc, and it
> indeed aborts as expected with quick_push on overflow for a
> stack-allocated auto_vec with N = 2.)
>
> This means that the assert above is indeed redundant, so I agree that
> we should be able to drop the assert and drop the max_mem_regs constant,
> using a literal inside the auto_vec template instead (all while still
> using quick_push).
>
> Does that sound OK to you, or did you have another reason to prefer
> safe_push?  AIUI the behaviour of safe_push on overflow would be to
> allocate a new (heap-allocated) vector instead of asserting.

I just thought it looked odd/unexpected.  Normally the intent of:

  auto_vec<foo, 16> bar;

is to reserve a sensible amount of stack space for the common case,
but still support the general case of arbitrarily many elements.
The common on-stack case will be fast with both quick_push and
safe_push[*].  The difference is just whether growing beyond the
static space would abort the compiler or work as expected.

quick_push makes sense if an earlier loop has calculated the runtime
length of the vector and if we've already reserved that amount, or if
there is a static property that guarantees a static limit.  But the limit
of 2 looked more like a general assumption, rather than something that
had been definitively checked by earlier code.

I was also wondering whether using safe_push on an array of auto_vecs
caused issues, and so you were having to work around that.  (I remember
sometimes hitting a warning about attempts to delete an on-stack buffer,
presumably due to code duplication creating contradictory paths that
jump threading couldn't optimise away as dead.)

No real objection though.  Just wanted to clarify what I meant. :)

Thanks,
Richard

[*] well, ok, quick_push will be slightly faster in release builds,
    since quick_push won't do a bounds check in that case.  But the
    check in safe_push would be highly predictable.

Reply via email to