Re: [PATCH] tools/nolibc: add support for N64 and N32 ABIs

Maciej W. Rozycki Mon, 17 Feb 2025 15:23:28 -0800

Hi Thomas,

> thanks for your feedback!


 You're welcome.  Sadly little MIPS-fu seems still available nowadays and 
for me it's often too easy to miss MIPS-related topics in the mailing list 
flood.

> >  Why is this code breaking stack alignment just to have to fix it up two 
> > instructions down the line?  Or is it that the incoming $sp is not aligned 
> > in the first place (in which case we're having a deeper problem).
> 
> nolibc itself does not assume that $sp is aligned.
> Maybe Willy can explain the historical background.

 I'm all ears.

> The System V ABI MIPS supplement [0] says the following:
> 
> The registers listed below have the specified contents at process entry:
>       ...
> 
>       $sp The stack pointer holds the address of the bottom of the stack, 
> which
>       must be doubleword (8 byte) aligned.
>       ...
> 
> However "process entry" is main(), while this code is running before main.

 Umm, no, process entry definitely is not main(); if you refer to the 
somewhat obsolete and inaccurate MIPS psABI, then please note that the 
paragraph right above your quote says:

$2      A non-zero value specifies a function pointer the application 
        should register with atexit(BA_OS).  If $2 contains zero, no 
        action is required.

and one immediately below it says:

$31     The return address register is set to zero so that programs that 
        search backward through stack frames (stack backtracing) recognize 
        the last stack frame, that is, a stack frame with a zero in the 
        saved $31 slot.

and there is more on the initial process stack earlier on, including the 
location of the auxiliary vector.  All it making it clear it's not main() 
this specification refers to, but the entry point from the OS kernel (then 
of course you're aware already it's o32 it talks about; n64/n32 requires 
16 bytes, but then again we only have secondary references[1][2], in this 
case for SGI IRIX).

> The kernel always aligns the stack to a multiple of 16 bytes.
> See the usage of STACK_ROUND() in fs/binfmt_elf.c.
> 
> So I guess we could remove the manual alignment.
> (At least for alignments of 16 bytes and less)

 I think they all need to go then, but then stack pointer adjustments have 
to be made in multiples of the alignment required by the psABI of course.

> > > +
> > > +         /* ABI requires current function address in $t9 */
> > > +#if defined(_ABIO32) || defined(_ABIN32)
> > > +         "lui $t9, %hi(_start_c)\n"
> > >           "ori $t9, %lo(_start_c)\n"
> > > +#else
> > > +         "lui  $t9, %highest(_start_c)\n"
> > > +         "ori  $t9, %higher(_start_c)\n"
> > > +         "dsll $t9, 0x10\n"
> > > +         "ori  $t9, %hi(_start_c)\n"
> > > +         "dsll $t9, 0x10\n"
> > > +         "ori  $t9, %lo(_start_c)\n"
> > 
> >  This could be optimised using a temporary (e.g. $at, but I guess any will 
> > do as I gather we don't have any ABI abnormalities here).
> 
> clang rejects manual usage of $at without ".set noat".
> So $t0 is simpler.

 It's always `.set at' that's required to use $at by hand; it's been so 
long before clang was even thought of.  Or you could use LA and DLA macros 
for o32/n32 and n64 respectively for the assembler to do all this stuff 
for you in the default `.set reorder' mode (assuming that clang is not 
completely broken here).

> > > +#endif
> > > +
> > >           "jalr $t9\n"             /* transfer to c runtime
> > > */
> > >           " nop\n"                 /* delayed slot
> > 
> >  On an unrelated matter JALR above ought to be JAL (or otherwise there's 
> > no point in using the .cprestore pseudo-op).  And I fail to see why this 
> > code has to be "noreorder" (except for the .cpload piece, of course), it's 
> > just asking for troubles.
> 
> Thanks for the hints.
> 
> Without "noreorder", is the manually addition of the delayed slot "nop"
> still necessary?

 It's not.  It's for the `.set noreorder' mode only, to fill the branch 
delay slot by hand.  Otherwise it'll become just a useless instruction 
following the call sequence and executed after return.

> These points also apply to the existing O32 implementation, right?

 Correct.  Sadly it's the first time I see this code.

 Overall I find it a bit of a chimera: it uses `.set noreorder' and 
explicit relocations on one hand, and then high-level assembly `.cpload' 
and `.cprestore' pseudo-ops on the other, effectively mixing the two 
styles of assembly.

 The pseudo-ops come from times when using assembly macros was the norm 
and are there to support that coding model where macros rely on these 
pseudo-ops, and before the use of explicit relocations became the norm at 
least for GCC.  In the absence of assembly macros you can write code 
expansions for these pseudo-ops by hand, just as what GCC does nowadays 
(in the `-mexplicit-relocs' mode, which is usually the default).

 But due to architecture variations it's very hard to write handcoded 
assembly in the `.set noreorder' mode: you need to take care of all the 
data dependencies that appear due to the lack of interlocking between 
pipeline stages, which results in code that either works correctly 
everywhere, but is suboptimal for newer architecture revisions, or code 
that works better with newer architecture revisions, but is actually 
broken with earlier ones.

 About the only typical case where you do want to use `.set noreorder' is 
to schedule a branch delay slot by hand due to a data anti-dependency 
between the two instructions.  Patchable/self-modifying code is a less 
frequent case.  And I had literally one case in my entire career where I 
wanted to actually jump to a branch delay slot instruction (it's still 
there in arch/mips/include/asm/div64.h, in do_div64_32(); see label #2).

 Also it appears no code in this function actually relies on $gp having 
been set up, so perhaps this stuff can just be discarded in the first 
place?

References:

[1] "MIPSpro 64-Bit Porting and Transition Guide", 
    <https://irix7.com/techpubs/007-2391-006.pdf>

[2] "MIPSpro N32 ABI Handbook", 
    <https://irix7.com/techpubs/007-2816-004.pdf>

  Maciej

Re: [PATCH] tools/nolibc: add support for N64 and N32 ABIs

Reply via email to