While investigating code paths that use system() and popen() and trying to write latch-multiplexing replacements, which I'll write about separately, leaks of $SUBJECT became obvious. On a FreeBSD box, you can see the sockets, pipes and data and WAL files that the subprocess inherits:
create table t (x text); copy t from program 'procstat -f $$'; select * from t; (On Linux, you might try something like 'ls -slap /proc/self/fd'. Not sure how to do it on a Mac but note that 'lsof' is no good for this purpose because the first thing it does is close all descriptors > 2; maybe 'lsof -p $PPID' inside a shell wrapper so you're analysing the shell process that sits between postgres and lsof, rather than lsof itself?) Since we've started assuming a few other bits of SUSv3 (POSIX 2008) functionality, the standard that specified O_CLOEXEC, and since in practice Linux, *BSD, macOS, AIX, Solaris, illumos all have it, I think we can unconditionally just use it on all files we open. That is, if we were to make fallback code, it would be untested, and if we were to do it with fcntl() always it would be a frequent extra system call that we don't need to support a computer that doesn't exist. For sockets and pipes, much more rarely created, some systems have non-standard extensions along the same lines, but I guess we should stick with standards and call fcntl(FD_CLOEXEC) for now. There is a place in fd.c that already referenced O_CLOEXEC (it wasn't really using it, just making an assertion that flags don't collide), with #ifdef around it, but that was only conditional because at the time of commit 04cad8f7 we had a macOS 10.4 system (released 2005) in the 'farm which obviously didn't know about POSIX 2008 interfaces. We can just remove that #ifdef. (It's probably OK to remove the test of O_DSYNC too but I'll think about that another time.) On Windows, handles, at least as we create them, are not inherited so the problem doesn't come up AFAICS. I *think* if we were to use Windows' own open(), that would be an issue, but we have our own CreateFile() thing and it doesn't turn on inheritance IIUC. So I just gave O_CLOEXEC a zero definition there. It would be interesting to know what handles a subprocess sees. If someone who knows how to drive Windows could run a subprogram that just does the equivalent of 'sleep 60' they might be able to see that in one of those handle spy tools, to visually check the above. (If I'm wrong about that, it might be possible for a subprocess to interfere with a ProcSignalBarrier command to close all files, so I'd love to know about it.) We were already doing FD_CLOEXEC on the latch self-pipe with comment "we surely do not want any child processes messing with them", so it's not like this wasn't a well-known issue before, but I guess it just never bothered anyone enough to do anything about the more general problem. With the attached, the test at the top of this email shows only in, out, error, and one thing that procstat opened itself. Are there any more descriptors we need to think about?
From 3a267bccd3a4a80f87ae620fc615a310a3f5f1ce Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Sun, 5 Feb 2023 12:18:14 +1300 Subject: [PATCH] Don't leak descriptors into subprograms. Open data and WAL files with O_CLOEXEC from SUSv3, available on all our target systems. Do the same for sockets with FD_CLOEXEC. This means that programs executed by COPY, or archiving scripts etc, will not be able to mess with those descriptors (of course nothing stops them from opening files, so this is more about tidyness and preventing accidental problems rather than security). --- src/backend/access/transam/xlog.c | 9 ++++++--- src/backend/libpq/pqcomm.c | 9 +++++++++ src/backend/storage/file/fd.c | 2 -- src/backend/storage/smgr/md.c | 9 +++++---- src/include/port/win32_port.h | 7 +++++++ 5 files changed, 27 insertions(+), 9 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fb4c860bde..3b44a0f237 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2937,7 +2937,8 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, * Try to use existent file (checkpoint maker may have created it already) */ *added = false; - fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method)); + fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | + get_sync_bit(sync_method)); if (fd < 0) { if (errno != ENOENT) @@ -3098,7 +3099,8 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli) return fd; /* Now open original target segment (might not be file I just made) */ - fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method)); + fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | + get_sync_bit(sync_method)); if (fd < 0) ereport(ERROR, (errcode_for_file_access(), @@ -3329,7 +3331,8 @@ XLogFileOpen(XLogSegNo segno, TimeLineID tli) XLogFilePath(path, tli, segno, wal_segment_size); - fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method)); + fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | + get_sync_bit(sync_method)); if (fd < 0) ereport(PANIC, (errcode_for_file_access(), diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 864c9debe8..d4622533fd 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -200,6 +200,15 @@ pq_init(void) (errmsg("could not set socket to nonblocking mode: %m"))); #endif +#ifndef WIN32 + /* + * Also make sure that any subprocesses that execute a new program don't + * inherit the socket. + */ + if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0) + elog(FATAL, "fcntl(F_SETFD) failed on socket: %m"); +#endif + FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, FeBeWaitSetNEvents); socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock, NULL, NULL); diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 926d000f2e..933f17b398 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -1033,10 +1033,8 @@ tryAgain: O_TRUNC | O_WRONLY)) == 0, "PG_O_DIRECT value collides with standard flag"); -#if defined(O_CLOEXEC) StaticAssertStmt((PG_O_DIRECT & O_CLOEXEC) == 0, "PG_O_DIRECT value collides with O_CLOEXEC"); -#endif #if defined(O_DSYNC) StaticAssertStmt((PG_O_DIRECT & O_DSYNC) == 0, "PG_O_DIRECT value collides with O_DSYNC"); diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 60c9905eff..c464d6338c 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -205,14 +205,15 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo) path = relpath(reln->smgr_rlocator, forknum); - fd = PathNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY); + fd = PathNameOpenFile(path, + O_RDWR | O_CREAT | O_EXCL | PG_BINARY | O_CLOEXEC); if (fd < 0) { int save_errno = errno; if (isRedo) - fd = PathNameOpenFile(path, O_RDWR | PG_BINARY); + fd = PathNameOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC); if (fd < 0) { /* be sure to report the error reported by create, not open */ @@ -523,7 +524,7 @@ mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior) path = relpath(reln->smgr_rlocator, forknum); - fd = PathNameOpenFile(path, O_RDWR | PG_BINARY); + fd = PathNameOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC); if (fd < 0) { @@ -1188,7 +1189,7 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno, fullpath = _mdfd_segpath(reln, forknum, segno); /* open the file */ - fd = PathNameOpenFile(fullpath, O_RDWR | PG_BINARY | oflags); + fd = PathNameOpenFile(fullpath, O_RDWR | PG_BINARY | O_CLOEXEC | oflags); pfree(fullpath); diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h index 9488195799..a8685f1520 100644 --- a/src/include/port/win32_port.h +++ b/src/include/port/win32_port.h @@ -353,6 +353,13 @@ extern int _pglstat64(const char *name, struct stat *buf); */ #define O_DSYNC 0x0080 +/* + * Our open wrapper does not create inheritable handles, so it is safe to + * ignore O_CLOEXEC. (If we were using Windows' own open(), it might be + * necessary to convert this to _O_NOINHERIT.) + */ +#define O_CLOEXEC 0 + /* * Supplement to <errno.h>. * -- 2.39.1