I’m reporting the problem and (hopefully) the solution, but I think we’d better double-check this.
The problem: Running the test below in a loop sometimes gets a SIGSEGV in the child process (on x86_64, libc 2.22.) --8<---------------cut here---------------start------------->8--- (use-modules (guix build syscalls) (ice-9 match)) (match (clone (logior CLONE_NEWUSER CLONE_CHILD_SETTID CLONE_CHILD_CLEARTID SIGCHLD)) (0 (throw 'x)) ;XXX: sometimes segfaults (pid (match (waitpid pid) ((_ . status) (pk 'status status) (exit (not (status:term-sig status))))))) --8<---------------cut here---------------end--------------->8--- Looking at (guix build syscalls) though, I see an ABI mismatch between our definition and the actual ‘syscall’ C function, and between our ‘clone’ definition and the actual C function. This leads to the attached patch, which also fixes the above problem for me.
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm index 80b9d00..f931f8d 100644 --- a/guix/build/syscalls.scm +++ b/guix/build/syscalls.scm @@ -322,10 +322,16 @@ string TMPL and return its file name. TMPL must end with 'XXXXXX'." (define CLONE_NEWNET #x40000000) ;; The libc interface to sys_clone is not useful for Scheme programs, so the -;; low-level system call is wrapped instead. +;; low-level system call is wrapped instead. The 'syscall' function is +;; declared in <unistd.h> as a variadic function; in practice, it expects 6 +;; pointer-sized arguments, as shown in, e.g., x86_64/syscall.S. (define clone (let* ((ptr (dynamic-func "syscall" (dynamic-link))) - (proc (pointer->procedure int ptr (list int int '*))) + (proc (pointer->procedure long ptr + (list long ;sysno + unsigned-long ;flags + '* '* '* + '*))) ;; TODO: Don't do this. (syscall-id (match (utsname:machine (uname)) ("i686" 120) @@ -336,7 +342,10 @@ string TMPL and return its file name. TMPL must end with 'XXXXXX'." "Create a new child process by duplicating the current parent process. Unlike the fork system call, clone accepts FLAGS that specify which resources are shared between the parent and child processes." - (let ((ret (proc syscall-id flags %null-pointer)) + (let ((ret (proc syscall-id flags + %null-pointer ;child stack + %null-pointer %null-pointer ;ptid & ctid + %null-pointer)) ;unused (err (errno))) (if (= ret -1) (throw 'system-error "clone" "~d: ~A"
Could you test this patch? Now, there remains the question of CLONE_CHILD_SETTID and CLONE_CHILD_CLEARTID. Since we’re passing NULL for ‘ctid’, I expect that these flags have no effect at all. Conversely, libc uses these flags to update the thread ID in the child process (x86_64/arch-fork.h): --8<---------------cut here---------------start------------->8--- #define ARCH_FORK() \ INLINE_SYSCALL (clone, 4, \ CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0, \ NULL, &THREAD_SELF->tid) --8<---------------cut here---------------end--------------->8--- This is certainly useful, but we’d have troubles doing it from the FFI… It may that this is fine if the process doesn’t use threads. Ludo’.