On Mon, Mar 26, 2018 at 9:41 PM, Seebs <se...@seebs.net> wrote: > >> The syscall manpage is from the kernel manpages, not glibc. > >> http://man7.org/linux/man-pages/man2/syscall.2.html > > And yet! glibc is setting those registers in its code. Why? If that's a > kernel thing and libc doesn't need to do it, why is libc doing it?
Of course libc syscall is setting those registers WITHIN it's code. The job of the syscall() function is to translate from a C callable API into a kernel syscall - so it must read arguments passed in from the C caller (via the normal C function call rules, e.g. the first few arguments passed via registers, the rest on the stack, etc) and use them to setup a kernel syscall (via the kernel's syscall interface, ie maximum of 6 arguments, all passed via registers). After the kernel syscall has returned, the implementation of libc syscall() needs to collect the result from whichever register the kernel leaves it in and return it via the normal C function call rules (plus take care of some extra housekeeping, ie setting errno). Whatever happens within syscall() is not important. The key point is that it's a C callable function and follows standard C function call rules. > Okay, you've read the code in glibc and understand it. So, why does the > glibc code have that register-setting assembly, if that register-setting > assembly doesn't matter? If you are asking why does glibc implement syscall() in assembler when it could be implemented in completely generic C code (as musl does) then the answer is I don't know. Historical I guess. Looking at the glibc 32bit ARM syscall() assembler. After stripping away the cfi_XXX annotations (ie stuff related to debug, not actual opcodes) the assembler is: ENTRY (syscall) mov ip, sp push {r4, r5, r6, r7} mov r7, r0 mov r0, r1 mov r1, r2 mov r2, r3 ldmfd ip, {r3, r4, r5, r6} swi 0x0 pop {r4, r5, r6, r7} cmn r0, #4096 it cc RETINSTR(cc, lr) b PLTJMP(syscall_error) PSEUDO_END (syscall) ie it's pushing the original contents of r4, r5, r6 and r7 to the stack, shuffling the first 4 arguments from C into the kernel's syscall registers (the syscall number in r0 -> r7, the first argument in r1 -> r0, etc), loading the next 4 arguments from C into registers (cunningly, it loads 4 arguments directly from the stack into the registers used for the next 4 arguments for the kernel syscall). Interestingly, it's taking a total of 8 arguments from the C caller - the first is the syscall number, then 7 additional arguments (one more than required if the maximum is 6). It then invokes the syscall, restores the callers original r4, r5, r6 and r7 values from the stack and returns via a helper to set errno if the result from the kernel indicated an error. Now, looking at the C code implementation of syscall() in musl: long syscall(long n, ...) { va_list ap; syscall_arg_t a,b,c,d,e,f; va_start(ap, n); a=va_arg(ap, syscall_arg_t); b=va_arg(ap, syscall_arg_t); c=va_arg(ap, syscall_arg_t); d=va_arg(ap, syscall_arg_t); e=va_arg(ap, syscall_arg_t); f=va_arg(ap, syscall_arg_t); va_end(ap); return __syscall_ret(__syscall(n,a,b,c,d,e,f)); } It fetches 6 va_args arguments from the caller, using standard C function calling rules, and passes them on to the architecture specific __syscall() macro, which will put the arguments in the registers used for the kernel syscall and then invoke the syscall. Note that since this is pure generic C code, you can insert debug, call other functions etc where ever you like (the only thing that needs special attention is that __syscall_ret() set errno). Compiling the musl C code for 32bit ARM gives the following assembler: 00000000 <syscall>: 0: e92d000f push {r0, r1, r2, r3} 4: e92d48b0 push {r4, r5, r7, fp, lr} 8: e28db010 add fp, sp, #16 c: e28b0008 add r0, fp, #8 10: e24dd00c sub sp, sp, #12 14: e28bc008 add ip, fp, #8 18: e59b7004 ldr r7, [fp, #4] 1c: e50bc018 str ip, [fp, #-24] ; 0xffffffe8 20: e890000f ldm r0, {r0, r1, r2, r3} 24: e59b4018 ldr r4, [fp, #24] 28: e59b501c ldr r5, [fp, #28] 2c: ef000000 svc 0x00000000 30: ebfffffe bl 0 <__syscall_ret> 34: e24bd010 sub sp, fp, #16 38: e8bd48b0 pop {r4, r5, r7, fp, lr} 3c: e28dd010 add sp, sp, #16 40: e12fff1e bx lr Although this is a bit of a mess (gcc obviously isn't good at optimising va_args as it needlessly saves the first 4 arguments to the stack and then loads them back again...) the basic shuffling of arguments from a C function call into the registers used for the kernel syscall is the same as the glibc assembler! (Apart from the fact it only handles 6 syscall arguments, not 7 as the glibc assembler does, so nothing is setup in r6). ie the glibc assembler isn't some mysterious function with a non standard calling convention - it's just an optimised implementation of a standard C function. > Okay, you say you understand why ARM EABI "sometimes" needs an argument > to offset things. What are the circumstances? The background to this is that in ARM 32bit EABI, 64bit values in registers need to be kept in an even/odd register pair, which then allows "double word" load and store instructions (ie single instructions, first added in ARMv5, which can load or store 64bit values from an even/odd register pair) to be used to read and write them to/from memory. Since the ARM 32bit EABI kernel syscall interface uses registers r0,r1,r2,r3, etc to pass the syscall arguments, a padding argument is required if the first word of a 64bit value passed to the kernel would not naturally be placed into an even numbered register. In the readahead example, the first syscall argument is the 32bit file descriptor (which will be passed to the kernel in r0), therefore a padding argument is required to fill r1 and ensure that the first word of the 64bit offset gets passed in r2. > Is it specific to 32-bit > targets? The above is completely specific to ARM 32bit EABI. I guess *similar* issues may apply to some other 32bit architectures (as suggested in the manpage). It's certainly not an issue with is generic to all 32bit targets though. > On a target with 64-bit pointers, would it apply also to > 64-bit pointers, or is it exclusively for 64-bit integers? Since 64bit architectures can, by definition, read and write 64bit values to memory using single load and store instructions, no 64bit architecture would have an ABI which places a restriction that 64bit values need to be held in any particular register - so no padding arguments would ever be required to accommodate that. > Because it seems to me that on a 64-bit target, renameat2() would in > fact be passing a 64-bit object as the second argument. And if there's > a reason that this doesn't count as a 64-bit argument passed after an > odd number of 32-bit arguments, I'd like to know specifically what that > reason is before I go relying on it to stay true forever. For a 64bit architecture, the distinction between a 32bit argument and a 64bit argument is only in how you interpret that data. In all cases the data is passed as a 64bit value. The code calling libc syscall() and the code within the kernel which interprets the syscall arguments must agree on the format of the data, but for a libc syscall() implementation which just passes the arguments along it can treat everything as 64bit values. It doesn't matter if an argument is actually int, long, or pointer. See the musl syscall() implementation - all va_args values are extracted from the caller as long. If syscall(), or a wrapper for it, *does* need to interpret the arguments for a particular syscall then the syscall() implementation would have to also agree with the interpretation of the data defined by the kernel. -- _______________________________________________ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core