Ludovic Courtès <l...@gnu.org> writes:
> I was able to capture an strace log of this: > > 15837 clone(child_stack=NULL, > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0x7fb10dad7850) = 15838 > 15838 set_robust_list(0x7fb10dad7860, 24) = 0 > 15837 wait4(15838, <unfinished ...> > 15838 close(3) = 0 > 15838 close(4) = 0 > 15838 pipe2([3, 4], O_CLOEXEC) = 0 > [...] > 15838 > clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, > child_tid=0x7fb10beaa990, parent_tid=0x7fb10beaa990, exit_signal=0, > stack=0x7fb10b51b000, stack_size=0x98ef80, tls=0x7fb10beaa6c0} => > {parent_tid=[15839]}, 88) = 15839 > 15839 rseq(0x7fb10beaafe0, 0x20, 0, 0x53053053 <unfinished ...> > 15838 rt_sigprocmask(SIG_SETMASK, [], <unfinished ...> > [...] > 15838 lseek(2, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) > 15839 close(10) = 0 > 15839 close(17 <unfinished ...> > 15838 dup2(22, 6 <unfinished ...> > 15839 <... close resumed>) = 0 > 15838 <... dup2 resumed>) = 6 > 15839 close(6 <unfinished ...> > 15838 fcntl(6, F_GETFL <unfinished ...> > 15839 <... close resumed>) = 0 > 15838 <... fcntl resumed>) = -1 EBADF (Bad file descriptor) > 15839 close(7) = 0 > 15839 close(18) = 0 > 15839 close(15) = 0 > 15839 close(12) = 0 > 15839 close(9) = 0 > 15839 close(16) = 0 > 15838 write(2, "Backtrace:\n", 11) = 11 > > The sequence goes like this: > > 1. A child process (15837) corresponding to the subshell is created; > > 2. That process creates a finalization thread (15839); > > 3. Main thread does dup2(22, 6); finalization does close(6); main > thread does fcntl(6, F_GETFL), which fails with EBADF. > > I suspect something like a wrong revealed count on the relevant ports, > possibly those created in ‘install-current-ports!’. In “boot-9.scm”, we have (define dup->port (case-lambda ((port/fd mode) (fdopen (dup->fdes port/fd) mode)) ((port/fd mode new-fd) (let ((port (fdopen (dup->fdes port/fd new-fd) mode))) (set-port-revealed! port 1) port)))) It looks like the system calls on the main thread correspond to this code (which is called from ‘install-current-ports!’ via ‘dup’). Specifically, ‘dup2’ is called from ‘dup->fdes’ and ‘fcntl’ is called from ‘fdopen’. The way that ‘dup->fdes’ works is that it first makes sure that no existing port has the desired file descriptor (‘scm_evict_ports’), and then calls ‘dup2‘. This should mean that the requested file descriptor is up for grabs. Here’s my guess as to what‘s happening. For brevity let’s call the port with file descriptor 6 “P”. 1. The GC runs, nullifying the entry for P in the port table (weak key hash table), and queuing its finalizer. 2. The evict ports loop runs, missing P because it was nullified (see ‘scm_internal_hash_fold’). 3. ‘dup2’ turns 22 to 6. 4. The finalizer for P runs, closing 6. 5. ‘fdopen’ calls ‘fcntl’ on 6, which results in EBADF. And here’s a reproducer: (let loop () (define fd #f) (let ((P (open-input-file "/dev/null"))) ;; Does not change the revealed count of P. (set! fd (fileno P))) (let ((port (open-input-file "/dev/null"))) (dup->port port "r" fd) (close-port port) (loop))) This results in EBADF in seemingly exactly the same way. (I had to run it a few times: sometimes it runs out of file descriptors first.) This happens on bootstrap Guile (2.0.9) and modern Guile. That’s all I have for now. I’m not sure how to avoid this without resorting to calling “(gc)” to synchronously run the finalizers before trying to mess with the file descriptors. -- Tim