On 2019-03-29 14:05, Michael Paquier wrote: > Yes, I think that we are going to need an equivalent of that for all > frontend tools. At various degrees, making sure that a fsync happens > is also important for pg_dump, pg_basebackup, pg_rewind and > pg_checksums so it is not only a problem of the two tools you mention. > It seems to me that the most correct way to have those failures would > be to use directly exit(EXIT_FAILURE) in file_utils.c where > appropriate.
Yeah, there is more to do. The reason I'm focusing on these two right now is that they would typically run as a background service, and a clean exit is most important there. In the other cases, the program runs more often in the foreground and you can see error messages. There are also some cases where fsync() failures are intentionally ignored ((void) casts), so some of that would need to be investigated further. Here is a patch to get started. Note that these calls don't go through file_utils.c, so it's a separate issue anyway. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 8f275321775f41d4a46ea6e1b10765c68fe2653d Mon Sep 17 00:00:00 2001 From: Peter Eisentraut <pe...@eisentraut.org> Date: Tue, 25 Jun 2019 14:19:26 +0200 Subject: [PATCH] Handle fsync failures in pg_receivewal and pg_recvlogical It is not safe to simply report an fsync error and continue. We must exit the program instead. --- src/bin/pg_basebackup/pg_recvlogical.c | 4 ++-- src/bin/pg_basebackup/receivelog.c | 12 ++++++------ src/bin/pg_basebackup/walmethods.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index b029118bf6..9d8f8c74f8 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -192,8 +192,8 @@ OutputFsync(TimestampTz now) if (fsync(outfd) != 0) { - pg_log_error("could not fsync file \"%s\": %m", outfile); - return false; + pg_log_fatal("could not fsync file \"%s\": %m", outfile); + exit(2); } return true; diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c index 2fd16421aa..81fc1b0a01 100644 --- a/src/bin/pg_basebackup/receivelog.c +++ b/src/bin/pg_basebackup/receivelog.c @@ -134,10 +134,10 @@ open_walfile(StreamCtl *stream, XLogRecPtr startpoint) /* fsync file in case of a previous crash */ if (stream->walmethod->sync(f) != 0) { - pg_log_error("could not fsync existing write-ahead log file \"%s\": %s", + pg_log_fatal("could not fsync existing write-ahead log file \"%s\": %s", fn, stream->walmethod->getlasterror()); stream->walmethod->close(f, CLOSE_UNLINK); - return false; + exit(2); } walfile = f; @@ -763,9 +763,9 @@ HandleCopyStream(PGconn *conn, StreamCtl *stream, { if (stream->walmethod->sync(walfile) != 0) { - pg_log_error("could not fsync file \"%s\": %s", + pg_log_fatal("could not fsync file \"%s\": %s", current_walfile_name, stream->walmethod->getlasterror()); - goto error; + exit(2); } lastFlushPosition = blockpos; @@ -1015,9 +1015,9 @@ ProcessKeepaliveMsg(PGconn *conn, StreamCtl *stream, char *copybuf, int len, */ if (stream->walmethod->sync(walfile) != 0) { - pg_log_error("could not fsync file \"%s\": %s", + pg_log_fatal("could not fsync file \"%s\": %s", current_walfile_name, stream->walmethod->getlasterror()); - return false; + exit(2); } lastFlushPosition = blockpos; } diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c index 83b520898b..d8abea6dc6 100644 --- a/src/bin/pg_basebackup/walmethods.c +++ b/src/bin/pg_basebackup/walmethods.c @@ -864,7 +864,7 @@ tar_close(Walfile f, WalCloseMethod method) /* Always fsync on close, so the padding gets fsynced */ if (tar_sync(f) < 0) - return -1; + exit(2); /* Clean up and done */ pg_free(tf->pathname); -- 2.22.0