While investigating code paths that use system() and popen() and
trying to write latch-multiplexing replacements, which I'll write
about separately, leaks of $SUBJECT became obvious.  On a FreeBSD box,
you can see the sockets, pipes and data and WAL files that the
subprocess inherits:

  create table t (x text);
  copy t from program 'procstat -f $$';
  select * from t;

(On Linux, you might try something like 'ls -slap /proc/self/fd'.  Not
sure how to do it on a Mac but note that 'lsof' is no good for this
purpose because the first thing it does is close all descriptors > 2;
maybe 'lsof -p $PPID' inside a shell wrapper so you're analysing the
shell process that sits between postgres and lsof, rather than lsof
itself?)

Since we've started assuming a few other bits of SUSv3 (POSIX 2008)
functionality, the standard that specified O_CLOEXEC, and since in
practice Linux, *BSD, macOS, AIX, Solaris, illumos all have it, I
think we can unconditionally just use it on all files we open.  That
is, if we were to make fallback code, it would be untested, and if we
were to do it with fcntl() always it would be a frequent extra system
call that we don't need to support a computer that doesn't exist.  For
sockets and pipes, much more rarely created, some systems have
non-standard extensions along the same lines, but I guess we should
stick with standards and call fcntl(FD_CLOEXEC) for now.

There is a place in fd.c that already referenced O_CLOEXEC (it wasn't
really using it, just making an assertion that flags don't collide),
with #ifdef around it, but that was only conditional because at the
time of commit 04cad8f7 we had a macOS 10.4 system (released 2005) in
the 'farm which obviously didn't know about POSIX 2008 interfaces.  We
can just remove that #ifdef.  (It's probably OK to remove the test of
O_DSYNC too but I'll think about that another time.)

On Windows, handles, at least as we create them, are not inherited so
the problem doesn't come up AFAICS.  I *think* if we were to use
Windows' own open(), that would be an issue, but we have our own
CreateFile() thing and it doesn't turn on inheritance IIUC.  So I just
gave O_CLOEXEC a zero definition there.  It would be interesting to
know what handles a subprocess sees.  If someone who knows how to
drive Windows could run a subprogram that just does the equivalent of
'sleep 60' they might be able to see that in one of those handle spy
tools, to visually check the above.  (If I'm wrong about that, it
might be possible for a subprocess to interfere with a
ProcSignalBarrier command to close all files, so I'd love to know
about it.)

We were already doing FD_CLOEXEC on the latch self-pipe with comment
"we surely do not want any child processes messing with them", so it's
not like this wasn't a well-known issue before, but I guess it just
never bothered anyone enough to do anything about the more general
problem.

With the attached, the test at the top of this email shows only in,
out, error, and one thing that procstat opened itself.

Are there any more descriptors we need to think about?
From 3a267bccd3a4a80f87ae620fc615a310a3f5f1ce Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Sun, 5 Feb 2023 12:18:14 +1300
Subject: [PATCH] Don't leak descriptors into subprograms.

Open data and WAL files with O_CLOEXEC from SUSv3, available on all our
target systems.  Do the same for sockets with FD_CLOEXEC.  This means
that programs executed by COPY, or archiving scripts etc, will not be
able to mess with those descriptors (of course nothing stops them from
opening files, so this is more about tidyness and preventing accidental
problems rather than security).
---
 src/backend/access/transam/xlog.c | 9 ++++++---
 src/backend/libpq/pqcomm.c        | 9 +++++++++
 src/backend/storage/file/fd.c     | 2 --
 src/backend/storage/smgr/md.c     | 9 +++++----
 src/include/port/win32_port.h     | 7 +++++++
 5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..3b44a0f237 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2937,7 +2937,8 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	 * Try to use existent file (checkpoint maker may have created it already)
 	 */
 	*added = false;
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+							 get_sync_bit(sync_method));
 	if (fd < 0)
 	{
 		if (errno != ENOENT)
@@ -3098,7 +3099,8 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
 		return fd;
 
 	/* Now open original target segment (might not be file I just made) */
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+							 get_sync_bit(sync_method));
 	if (fd < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
@@ -3329,7 +3331,8 @@ XLogFileOpen(XLogSegNo segno, TimeLineID tli)
 
 	XLogFilePath(path, tli, segno, wal_segment_size);
 
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+							 get_sync_bit(sync_method));
 	if (fd < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 864c9debe8..d4622533fd 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -200,6 +200,15 @@ pq_init(void)
 				(errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
+#ifndef WIN32
+	/*
+	 * Also make sure that any subprocesses that execute a new program don't
+	 * inherit the socket.
+	 */
+	if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0)
+		elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
+#endif
+
 	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, FeBeWaitSetNEvents);
 	socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
 								   MyProcPort->sock, NULL, NULL);
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 926d000f2e..933f17b398 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1033,10 +1033,8 @@ tryAgain:
 					   O_TRUNC |
 					   O_WRONLY)) == 0,
 					 "PG_O_DIRECT value collides with standard flag");
-#if defined(O_CLOEXEC)
 	StaticAssertStmt((PG_O_DIRECT & O_CLOEXEC) == 0,
 					 "PG_O_DIRECT value collides with O_CLOEXEC");
-#endif
 #if defined(O_DSYNC)
 	StaticAssertStmt((PG_O_DIRECT & O_DSYNC) == 0,
 					 "PG_O_DIRECT value collides with O_DSYNC");
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 60c9905eff..c464d6338c 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -205,14 +205,15 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 
 	path = relpath(reln->smgr_rlocator, forknum);
 
-	fd = PathNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+	fd = PathNameOpenFile(path,
+						  O_RDWR | O_CREAT | O_EXCL | PG_BINARY | O_CLOEXEC);
 
 	if (fd < 0)
 	{
 		int			save_errno = errno;
 
 		if (isRedo)
-			fd = PathNameOpenFile(path, O_RDWR | PG_BINARY);
+			fd = PathNameOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC);
 		if (fd < 0)
 		{
 			/* be sure to report the error reported by create, not open */
@@ -523,7 +524,7 @@ mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior)
 
 	path = relpath(reln->smgr_rlocator, forknum);
 
-	fd = PathNameOpenFile(path, O_RDWR | PG_BINARY);
+	fd = PathNameOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC);
 
 	if (fd < 0)
 	{
@@ -1188,7 +1189,7 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
 	fullpath = _mdfd_segpath(reln, forknum, segno);
 
 	/* open the file */
-	fd = PathNameOpenFile(fullpath, O_RDWR | PG_BINARY | oflags);
+	fd = PathNameOpenFile(fullpath, O_RDWR | PG_BINARY | O_CLOEXEC | oflags);
 
 	pfree(fullpath);
 
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 9488195799..a8685f1520 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -353,6 +353,13 @@ extern int	_pglstat64(const char *name, struct stat *buf);
  */
 #define O_DSYNC 0x0080
 
+/*
+ * Our open wrapper does not create inheritable handles, so it is safe to
+ * ignore O_CLOEXEC.  (If we were using Windows' own open(), it might be
+ * necessary to convert this to _O_NOINHERIT.)
+ */
+#define O_CLOEXEC 0
+
 /*
  * Supplement to <errno.h>.
  *
-- 
2.39.1

Reply via email to