My buildfarm animal dromedary ran out of disk space yesterday, which I found rather surprising because the last time I'd looked it had tens of GB to spare. On investigation, the problem was lots and lots of core images in /cores, which is where macOS drops them (by default at least). It looked like I was getting one new core image per buildfarm run, even successful runs. Even odder, they mostly seemed to be images from /bin/cp, not Postgres.
After investigation, the mechanism that's causing that is that the src/test/recovery/t/010_logical_decoding_timelines.pl test shuts down its replica server with a mode-immediate stop, which causes that postmaster to shut down all its children with SIGQUIT, and in particular that signal propagates to a "cp" command that the archiver process is executing. The "cp" is unsurprisingly running with default SIGQUIT handling, which per the signal man page includes dumping core. This makes me wonder whether we shouldn't be using some other signal to shut down archiver subprocesses. It's not real cool if we're spewing cores all over the place. Admittedly, production servers are likely running with "ulimit -c 0" on most modern platforms, so this might not be a huge problem in the field; but accumulation of core files could be a problem anywhere that's configured to allow server core dumps. I suspect the reason we've not noticed this in the buildfarm is that most of those platforms are configured to dump core into the data directory, where it'll be thrown away when we clean up after the run. But aside from macOS, the machines running recent systemd releases are likely accumulating cores somewhere behind the scenes, since systemd has seen fit to insert itself into core-handling along with everything else. Ideally, perhaps, we'd be using SIGINT not SIGQUIT to shut down non-Postgres child processes. But redesigning the system's signal handling to make that possible seems like a bit of a mess. Thoughts? regards, tom lane