Hi Thomas, > thanks for your feedback!
You're welcome. Sadly little MIPS-fu seems still available nowadays and for me it's often too easy to miss MIPS-related topics in the mailing list flood. > > Why is this code breaking stack alignment just to have to fix it up two > > instructions down the line? Or is it that the incoming $sp is not aligned > > in the first place (in which case we're having a deeper problem). > > nolibc itself does not assume that $sp is aligned. > Maybe Willy can explain the historical background. I'm all ears. > The System V ABI MIPS supplement [0] says the following: > > The registers listed below have the specified contents at process entry: > ... > > $sp The stack pointer holds the address of the bottom of the stack, > which > must be doubleword (8 byte) aligned. > ... > > However "process entry" is main(), while this code is running before main. Umm, no, process entry definitely is not main(); if you refer to the somewhat obsolete and inaccurate MIPS psABI, then please note that the paragraph right above your quote says: $2 A non-zero value specifies a function pointer the application should register with atexit(BA_OS). If $2 contains zero, no action is required. and one immediately below it says: $31 The return address register is set to zero so that programs that search backward through stack frames (stack backtracing) recognize the last stack frame, that is, a stack frame with a zero in the saved $31 slot. and there is more on the initial process stack earlier on, including the location of the auxiliary vector. All it making it clear it's not main() this specification refers to, but the entry point from the OS kernel (then of course you're aware already it's o32 it talks about; n64/n32 requires 16 bytes, but then again we only have secondary references[1][2], in this case for SGI IRIX). > The kernel always aligns the stack to a multiple of 16 bytes. > See the usage of STACK_ROUND() in fs/binfmt_elf.c. > > So I guess we could remove the manual alignment. > (At least for alignments of 16 bytes and less) I think they all need to go then, but then stack pointer adjustments have to be made in multiples of the alignment required by the psABI of course. > > > + > > > + /* ABI requires current function address in $t9 */ > > > +#if defined(_ABIO32) || defined(_ABIN32) > > > + "lui $t9, %hi(_start_c)\n" > > > "ori $t9, %lo(_start_c)\n" > > > +#else > > > + "lui $t9, %highest(_start_c)\n" > > > + "ori $t9, %higher(_start_c)\n" > > > + "dsll $t9, 0x10\n" > > > + "ori $t9, %hi(_start_c)\n" > > > + "dsll $t9, 0x10\n" > > > + "ori $t9, %lo(_start_c)\n" > > > > This could be optimised using a temporary (e.g. $at, but I guess any will > > do as I gather we don't have any ABI abnormalities here). > > clang rejects manual usage of $at without ".set noat". > So $t0 is simpler. It's always `.set at' that's required to use $at by hand; it's been so long before clang was even thought of. Or you could use LA and DLA macros for o32/n32 and n64 respectively for the assembler to do all this stuff for you in the default `.set reorder' mode (assuming that clang is not completely broken here). > > > +#endif > > > + > > > "jalr $t9\n" /* transfer to c runtime > > > */ > > > " nop\n" /* delayed slot > > > > On an unrelated matter JALR above ought to be JAL (or otherwise there's > > no point in using the .cprestore pseudo-op). And I fail to see why this > > code has to be "noreorder" (except for the .cpload piece, of course), it's > > just asking for troubles. > > Thanks for the hints. > > Without "noreorder", is the manually addition of the delayed slot "nop" > still necessary? It's not. It's for the `.set noreorder' mode only, to fill the branch delay slot by hand. Otherwise it'll become just a useless instruction following the call sequence and executed after return. > These points also apply to the existing O32 implementation, right? Correct. Sadly it's the first time I see this code. Overall I find it a bit of a chimera: it uses `.set noreorder' and explicit relocations on one hand, and then high-level assembly `.cpload' and `.cprestore' pseudo-ops on the other, effectively mixing the two styles of assembly. The pseudo-ops come from times when using assembly macros was the norm and are there to support that coding model where macros rely on these pseudo-ops, and before the use of explicit relocations became the norm at least for GCC. In the absence of assembly macros you can write code expansions for these pseudo-ops by hand, just as what GCC does nowadays (in the `-mexplicit-relocs' mode, which is usually the default). But due to architecture variations it's very hard to write handcoded assembly in the `.set noreorder' mode: you need to take care of all the data dependencies that appear due to the lack of interlocking between pipeline stages, which results in code that either works correctly everywhere, but is suboptimal for newer architecture revisions, or code that works better with newer architecture revisions, but is actually broken with earlier ones. About the only typical case where you do want to use `.set noreorder' is to schedule a branch delay slot by hand due to a data anti-dependency between the two instructions. Patchable/self-modifying code is a less frequent case. And I had literally one case in my entire career where I wanted to actually jump to a branch delay slot instruction (it's still there in arch/mips/include/asm/div64.h, in do_div64_32(); see label #2). Also it appears no code in this function actually relies on $gp having been set up, so perhaps this stuff can just be discarded in the first place? References: [1] "MIPSpro 64-Bit Porting and Transition Guide", <https://irix7.com/techpubs/007-2391-006.pdf> [2] "MIPSpro N32 ABI Handbook", <https://irix7.com/techpubs/007-2816-004.pdf> Maciej