Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

Georg-Johann Lay Wed, 17 Jul 2024 11:27:55 -0700

Am 17.07.24 um 19:51 schrieb Jeff Law:

On 7/17/24 11:13 AM, Georg-Johann Lay wrote:
Am 17.07.24 um 17:55 schrieb Jeff Law:
On 7/17/24 9:26 AM, Georg-Johann Lay wrote:
It looks fine for the trunk. Out of curiosity, does the avr portimplement linker relaxing for this case? That would seem to be
No. avr-ld performs relaxing, but only the two cases of

- JMP/CALL to RJMP/RCALL provided the offset fits.
- [R]CALL+RET to [R]JMP provided there's no label between.
Yea, the first could be comparable to other targets. The second isprobably not all that common since the compiler should be doing thattail call elimination.
It should.  But there are cases where gcc doesn't optimize, like

float add (float a, float b)
{
     return a + b;
}
Presumably the a+b is handled via a libcall rather than a normal call? Iguess there might be something in the path where that needs specialhandling. It's been like 20+ years since I was last in that code.Conceptually I don't see a reason why libcalls would need to be special.
Then there are the calls that are not visible to the compiler, like

long mul (long a, long b)
{
     return b * a;
}

so that the linker relaxations still have something to do.
Yea, if you're emitting the call behind the back of the compiler forthis kind of case, then the linker is your only real shot. I didsomething like that for a few key operations on the mn102 chip eons ago.
One job for Binutils could be optimizing fixed registers like in

char mul3 (char a, char b, char c)
{
     return a * b * c;
}

mul3:
     mul r22,r20     ;  21    [c=12 l=3]  *mulqi3_enh
     mov r22,r0
     clr r1
     mul r22,r24     ;  22    [c=12 l=3]  *mulqi3_enh
     mov r24,r0
     clr r1
     ret         ;  25    [c=0 l=1]  return

The first "clr r1" is void due to the following mul.
Just like GCC PR20296, the only feasible solution is by letting Binutils
do the job.  But I have no idea how to adjust branches without labels
like RJMP .+20 that cross an instruction that's optimized out.
I suspect the most important step is to prevent the assembler fromresolving pc-relative jumps and instead emit a suitable relocation. Oncethat's done I think the branches should get adjusted automatically.


May be that's already the case with -mlink-relax?  IIRC that was
introduced to keep the assembler from resolving label differences
when the linker may relax and hence change label differences, because
it shredded debug info.

...appears to work:

void trelax (void)
{
    __asm ("rjmp .+4"    "\n\t"
           "rcall main"  "\n\t"
           "ret"         "\n\t"
           "inc r0");
}

int main (void)
{
    return 0;
}

with -mrelax, the code is:

0000004c <trelax>:
  4c:   01 c0           rjmp    .+2             ; 0x50 <L0^A+0x2>

0000004e <L0^A>:
  4e:   15 c0           rjmp    .+42            ; 0x7a <main>
  50:   03 94           inc     r0
  52:   08 95           ret

so that the RJMP is still targeting the INC.  Though the
very optimization is performed by ld and not by gas.

And there is the complication that a zero_reg optimization
must only be performed on asm code from C/C++ that is using
the avr-gcc ABI.  But that could be handled by options, so
we'd have a change to the decide-specs again :-/
Or maybe better by a directive like .abi gcc or so.

Johann

Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

Reply via email to