Hey folks, I'm following up to the conversation here, as AFAICS the common factor in all the programs I list below is the use of libgpg-error...
On Tue, Jan 07, 2025 at 05:18:25PM -0500, Daniel Kahn Gillmor wrote: >Control: forwarded 1079696 https://dev.gnupg.org/T7478 >Control: reassign 1079696 libgpg-error0 1.51-3 > >Hi Russell-- > >On Mon 2024-08-26 22:48:02 +1000, Russell Coker wrote: >> openat(AT_FDCWD, "/proc/self/fd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) >> = -1 ENOENT (No such file or directory) >> prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1073741816}) = 0 >> close(3) = 0 >> close(4) = 0 >> [...] >> close(21599889) = -1 EBADF (Bad file descriptor) > >Thanks for identifying this. I think the issue is in libgpg-error (aka >gpgrt)'s mechanism for spawningn a POSIX subprocess. > >I've reported it upstream to see whether they have any preferred >solution to the problem. > >As for gpgconf itself, it's not even clear to me why `gpgconf --kill >all` would need to spawn a subprocess, except that the *_runtime_change >functions in tools/gpgconf-comp.c seem to expect it as an abstraction >layer, but this is more a question about the engineering choices around >process management happens in the gpg ecosystem. i'm frequently baffled >by process management in this suite, so i'm probably not the best person >to debug it directly. I'm not sure if this bug is actually fixed, or if we're seeing a similar but different bug in Pexip. We've building OS packages using chroots inside containers, and on a daily basis we rebuild those chroots using: sbuild-createchroot --include=eatmydata ${CHROOT_DIST} \ --chroot-prefix=${CHROOT_PREFIX} \ /var/chroot/${CHROOT_NAME} ${MIRROR} \ --extra-repository="${repo}" In a Bookworm container, this takes a few minutes per target chroot. We've just upgraded our build setup to Trixie and found that this process is now taking *hours* instead. The same applies whether we're building Bookworm or Trixie chroots. We can see that various processes: * apt-get ("apt-get update") * apt-config * gpgv are all sitting at 100% CPU for an extended length of time. strace shows them all doing variations on the same theme: ... fcntl(648153799, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153800, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153801, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153802, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153803, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153804, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153805, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153806, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153807, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153808, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153809, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153810, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153811, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153812, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) fcntl(648153813, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor) ... if we watch long enough, we can see that the loop here runs all the way up to 1073741816, which is the configured limit for open files on the system. This has changed from Bookworm to Trixie, I believe? Adding "ulimit -n 1024" to revert to Bookworm behaviour here makes things work in reasonable time. As far as we can tell, /proc is mounted just fine in the chroots when we're doing stuff here, so I'm not sure that the (non-)existence of /proc/self/fd is relevant to the behaviour we're seeing. Cheers, -- Steve McIntyre, Cambridge, UK. [email protected] < sladen> I actually stayed in a hotel and arrived to find a post-it note stuck to the mini-bar saying "Paul: This fridge and fittings are the correct way around and do not need altering"

