On Tue, Feb 24, 2026 at 7:33 AM Richard Sandiford
<[email protected]> wrote:
>
> Andrew Pinski <[email protected]> writes:
> > So the problem here is while forming chains, we don't process hard register
> > conflicts (and ABI based ones) for allocnos which are already part of a
> > chain.
> > This means sometimes we allocate a register to a color which might be
> > clobbered
> > over is live range.
> > Processing clobbers for all allocnos don't work while forming a chain does
> > not work as the chain's front allocnos' candidates does not get updated.
> > So we need to the processing of clobbers (and ABI clobbers) before starting
> > to form the chains.
> >
> > Changes since v1:
> > * v2: remove accidental hack which was there just for testing.
> >
> > Bootstrappd and tested on aarch64-linux-gnu.
> >
> > PR target/123285
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-early-ra.cc (early_ra::form_chains): Process
> > clobbers
> > and ABI clobbers before starting to form the chain.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/pr123285-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski <[email protected]>
>
> Thanks for looking at this. Couple of comments below.
>
> > ---
> > gcc/config/aarch64/aarch64-early-ra.cc | 40 ++++++++++++-------
> > gcc/testsuite/gcc.target/aarch64/pr123285-1.c | 36 +++++++++++++++++
> > 2 files changed, 61 insertions(+), 15 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr123285-1.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-early-ra.cc
> > b/gcc/config/aarch64/aarch64-early-ra.cc
> > index e2adde4ed94..85826ad81b5 100644
> > --- a/gcc/config/aarch64/aarch64-early-ra.cc
> > +++ b/gcc/config/aarch64/aarch64-early-ra.cc
> > @@ -2739,6 +2739,31 @@ early_ra::form_chains ()
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > fprintf (dump_file, "\nChaining allocnos:\n");
> >
> > + // Record conflicts of hard register and ABI conflicts before the
> > + // forming of chains so chains have the updated candidates
> > + for(auto *allocno1 : m_allocnos)
>
> Missing space after "for"
Ok.
>
> > + {
> > + // Record conflicts with direct uses for FPR hard registers.
> > + auto *group1 = allocno1->group ();
> > + for (unsigned int fpr = allocno1->offset; fpr < 32; ++fpr)
> > + {
> > + if (fpr == 30 && group1->regno == 112)
> > + group1->fpr_candidates &= ~(1U << (fpr - allocno1->offset));
>
> The hack still seems to be there :)
So what happened was the following:
I removed it from the tree where I was testing but then missed it from
the version which I had sent.
I just double checked the machine which I had tested with had the
corrected version too.
>
> > + if (fpr_conflicts_with_allocno_p (fpr, allocno1))
> > + group1->fpr_candidates &= ~(1U << (fpr - allocno1->offset));
> > + }
> > +
> > + // Record conflicts due to partially call-clobbered registers.
> > + // (Full clobbers are handled by the previous loop.)
> > + for (unsigned int abi_id = 0; abi_id < NUM_ABI_IDS; ++abi_id)
> > + if (call_in_range_p (abi_id, allocno1->start_point,
> > + allocno1->end_point))
> > + {
> > + auto fprs = partial_fpr_clobbers (abi_id, group1->fpr_size);
> > + group1->fpr_candidates &= ~fprs >> allocno1->offset;
> > + }
> > + }
> > +
>
> I think we should also move:
>
> if (allocno1->is_shared ())
> {
> auto *allocno2 = m_allocnos[allocno1->related_allocno];
> merge_fpr_info (allocno2->group (), group1, allocno2->offset);
> }
>
> although:
>
> if (allocno1->is_shared ())
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, " Allocno %d shares the same hard register"
> " as allocno %d\n", allocno1->id,
> allocno1->related_allocno);
> m_shared_allocnos.safe_push (allocno1);
> continue;
> }
>
> should probably stay where it is.
Let me test with the above changes too. This time I am going to make
the changes to the tree where I will send it out first and then copy
the exact patch over to the other machine.
Thanks,
Andrew
>
> OK with those changes, thanks.
>
> Richard
>
> > // Perform (modified) interval graph coloring. First sort by
> > // increasing start point.
> > m_sorted_allocnos.reserve (m_allocnos.length ());
> > @@ -2756,22 +2781,7 @@ early_ra::form_chains ()
> > if (allocno1->chain_next != INVALID_ALLOCNO)
> > continue;
> >
> > - // Record conflicts with direct uses for FPR hard registers.
> > auto *group1 = allocno1->group ();
> > - for (unsigned int fpr = allocno1->offset; fpr < 32; ++fpr)
> > - if (fpr_conflicts_with_allocno_p (fpr, allocno1))
> > - group1->fpr_candidates &= ~(1U << (fpr - allocno1->offset));
> > -
> > - // Record conflicts due to partially call-clobbered registers.
> > - // (Full clobbers are handled by the previous loop.)
> > - for (unsigned int abi_id = 0; abi_id < NUM_ABI_IDS; ++abi_id)
> > - if (call_in_range_p (abi_id, allocno1->start_point,
> > - allocno1->end_point))
> > - {
> > - auto fprs = partial_fpr_clobbers (abi_id, group1->fpr_size);
> > - group1->fpr_candidates &= ~fprs >> allocno1->offset;
> > - }
> > -
> > if (allocno1->is_shared ())
> > {
> > if (dump_file && (dump_flags & TDF_DETAILS))
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr123285-1.c
> > b/gcc/testsuite/gcc.target/aarch64/pr123285-1.c
> > new file mode 100644
> > index 00000000000..9ef5a28c9af
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr123285-1.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O3" } */
> > +/* PR target/123285 */
> > +
> > +#define BS_VEC(type, num) type __attribute__((vector_size(num *
> > sizeof(type))))
> > +
> > +/* f used to allocate v30 to either a or b and the inline-asm
> > + would clobber the v30. */
> > +[[gnu::noipa]]
> > +BS_VEC(int, 8) f(BS_VEC(int, 8) a, BS_VEC(int, 8) b)
> > +{
> > + a+=b;
> > + asm("movi v30.16b, 0":::"v30");
> > + a+=b;
> > + return a;
> > +}
> > +[[gnu::noipa]]
> > +BS_VEC(int, 8) f1(BS_VEC(int, 8) a, BS_VEC(int, 8) b)
> > +{
> > + a+=b;
> > + a+=b;
> > + return a;
> > +}
> > +
> > +int main()
> > +{
> > + BS_VEC(int, 8) a = {0,1,2,3,4,5,6,7};
> > + BS_VEC(int, 8) b = {8,9,10,11,12,13,14};
> > + BS_VEC(int, 8) c0 = f(a,b);
> > + BS_VEC(int, 8) c1 = f1(a,b);
> > + for(int i=0;i<8;i++)
> > + if ( c0[i] != c1[i] )
> > + __builtin_abort ();
> > +}
> > +
> > +