Hi,

Continuing a topic from earlier threads[1][2], I've been wondering
about how to de-klugify wal_sync_method=fsync_writethrough (a setting
that actually affects much more than just WAL), and how to do the
right thing for our users on macOS and Windows by default.  Commit
d0c28601 was a very small cleanup in this area.  Here are some bigger
ideas I'd like to try out.

Short version:

 * Make wal_sync_method=fdatasync the default everywhere
 * Drop wal_sync_method=fsync_writethrough
 * Add new macOS-only level for the fsync GUC: fsync=full
 * Make fsync=full redirect both pg_fsync() and pg_fdatasync()
 * Make fsync=full the default on macOS

Motivation:

I think expectations might have changed quite a bit since ~2000.  Back
then, fsync() didn't flush write caches on any OS (you were supposed
to use battery-backed controllers and SCSI as found on expensive
proprietary Unix systems if you were serious, IDE/ATA protocols didn't
originally have flush commands, and some consumer drives famously
ignored them or lied, so users of cheap drives were advised to turn
write caches off).  Around 2005, Linux decided to start sending the
flush command in fsync().  Windows' FlushFileBuffers() does the same,
and I gathered from Raymond Chen's blog that by the Windows 8
timeframe all consumer drive vendors supported and respected the flush
command.  macOS *still* doesn't send it for fsync(), but has had
fcntl(F_FULLFSYNC) since 2003.  In Apple's defence, they seem to have
been ahead of the curve on this problem[3]... I suppose they didn't
anticipate that everyone else was going to do it in their main
fsync()/fdatasync() call, they blazed their own trail, and now it all
looks a bit weird.

In other words, back then all systems running PostgreSQL risked data
loss unless you had fancy hardware or turned off unsafe caching.  But
now, due to the changing landscape and our policy choices, that is
true only for rarer systems by default while most in our community are
on Linux where this is all just a historical footnote.  People's
baseline expectations have moved, and although we try to document the
situation, they are occasionally very surprised: "Loaded footgun
open_datasync on Windows" was Laurenz Albe's reaction[4] to those
paragraphs.  Surely we should be able to recover after power loss by
default even on a lowly desktop PC or basic server loaded with SATA
drives, out of the box?

Proposal for Windows:

The existing default use of FILE_FLAG_WRITE_THROUGH is probably a
better choice on hardware where it works reliably (cache disabled,
non-volatile cache, or working FUA support), since it skips a system
call and doesn't wait for incidental other stuff in the cache to
flush, but it's well documented that Windows' SATA drivers neither
pass the "FUA" flag down to the device nor fall back to sending a full
cache flush command.  It's also easy to see in the pg_test_fsync
numbers, which are too good to be true on consumer gear.  Therefore
wal_sync_method=fdatasync is a better default level.  We map that to
NtFlushBuffersFileEx(FLUSH_FLAGS_FILE_DATA_SYNC_ONLY).  (The "SYNC" in
that flag name means flush the drive cache; the "DATA...ONLY" in that
flag name means skip non-essential stuff like file modification time
etc just like fdatasync() in POSIX, and goes visibly faster thanks to
not journaling metadata.)

Proposal for macOS:

Our current default isn't nice to users who run a database on
mains-powered Macs.  I don't have one myself to try it, but "man
fsync" clearly states that you can lose data and it is easily
demonstrated with a traditional cord-yanking test[5].  You could
certainly lose some recent commits; you could probably also get more
subtle corruption or a total recovery failure like [6] too, if for
example the control file can make it to durable storage and while
pointing to a checkpoint that did not (maybe a ZFS-like atomic
root-switch prevents that sort of disorder in APFS, I dunno, but I
read some semi-informed speculation that it doesn't work that way
*shrug*).

We do currently offer a non-default setting
wal_sync_method=fsync_writethough to address all this already.
Despite its name, it affects every caller of pg_fsync() (control file,
data files, etc).  It's certainly essential to flush all those files
fully too as part of our recovery protocol, but they're not "WAL".
The new idea here is to provide a separate way of controlling that
global behaviour, and I propose fsync=full.  Furthermore, I think that
setting should also affect pg_fdatasync(), given that Apple doesn't
even really have fdatasync() (perhaps if they carry out their threat
to implement it, they'll also invent F_FULLFDATASYNC; for now it
*seems* to be basically just another name for fsync() albeit
undeclared by <unistd.h>).

It's possible that fcntl(F_FULLFSYNC) might fail with ENOSUPP or other
errors in obscure cases (eg unusual file systems).  In that case, you
could manually lower fsync to just "on" and do your own research on
whether power loss can toast your database, but that doesn't seem like
a reason for us not to ship good solid defaults for typical users.

Rationale for changing wal_sync_method globally (for now):

With wal_sync_method=fdatasync as default for Linux, FreeBSD, OpenBSD,
DragonflyBSD already, if we added macOS and Windows, that'd leave only
NetBSD, AIX, Solaris/illumos.  I don't like having different and more
magical defaults on rare target OSes with no expert users left in our
community (as [6] reminded me), so I figure we'd be better off with
the same less magical setting everywhere, as a baseline.

Later we might want a per-platform default again.  For example, Linux
(like Windows) has policies on whether to believe FUA works reliably
for the purposes of O_DSYNC, but (unlike Windows) falls back to
sending cache flushes instead of doing nothing, so in theory
open_datasync might be a safe and sometimes better performing default
there.  If we decided to do that, we'd just restore the
PLATFORM_DEFAULT_SYNC_METHOD mechanism.

The only other OS where I have detailed enough knowledge to comment is
FreeBSD.  Its ZFS flushes caches for all levels just fine, so it
doesn't much matter, while its UFS never got that memo (so it's like a
Mac and probably other old Unixes; maybe I'll get that fixed, see
FreeBSD proposal D36371 if interested).  The reasons for using
fdatasync on both FreeBSD and Linux wasn't cache control policies, but
rather some obscure logic of ours that would turn on O_DIRECT in some
cases (and I think in the past when wal_level was lower by default, it
would have been common), which might have complications or fail.  The
last trace of that is gone since d4e71df6, so if we were to put Linux
on a 'known-good-for-open_datasync' list I'd probably also consider
putting FreeBSD on the list too.

Note that while this'll slow down some real world databases by being
more careful, 'meson test' time shouldn't be affected on any OS due to
use of fsync=off in tests.

Draft patches attached.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
[3] https://lists.apple.com/archives/darwin-dev/2005/Feb/msg00087.html
[4] 
https://www.postgresql.org/message-id/flat/1527846213.2475.31.camel%40cybertec.at
[5] https://news.ycombinator.com/item?id=30372194
[6] 
https://www.postgresql.org/message-id/flat/18009-40a42f84af3fbda1%40postgresql.org
From 31517fe5e69225aaffb6b9a49246b45ddad8e028 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Sat, 15 Jul 2023 11:41:52 +1200
Subject: [PATCH 1/2] Make wal_sync_method=fdatasync the default on all OSes.

Previously, fdatasync was the default level on Linux and FreeBSD by
special rules, and on OpenBSD and DragonflyBSD because they didn't have
O_DSYNC != O_SYNC or O_DSYNC at all.  For every other system, we'd
choose open_datasync.

Use fdatasync everywhere, for consistency.  This became possible after
commit 9430fb40 added support for Windows, the last known system not to
have it.  This means that we'll now flush caches on consumer drives by
default on Windows, where previously we didn't, which seems like a
better default for crash safety.  Users who want the older behavior can
still request it with wal_sync_method=open_datasync.

It's entirely likely that we'll reintroduce per-platform choices in
future, but this commit reverses our previous assumption that
open_datasync is the best choice unless we hear otherwise.

Reviewed-by: Anton A. Melnikov <aamelni...@inbox.ru>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 doc/src/sgml/config.sgml                      |  5 ++---
 src/backend/utils/misc/postgresql.conf.sample |  9 ++-------
 src/bin/pg_test_fsync/pg_test_fsync.c         |  4 ++--
 src/include/access/xlogdefs.h                 | 14 +-------------
 src/include/port/freebsd.h                    |  7 -------
 src/include/port/linux.h                      |  8 --------
 6 files changed, 7 insertions(+), 40 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c50c28546d..acb6666e0b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3156,9 +3156,8 @@ include_dir 'conf.d'
        </itemizedlist>
        <para>
         Not all of these choices are available on all platforms.
-        The default is the first method in the above list that is supported
-        by the platform, except that <literal>fdatasync</literal> is the default on
-        Linux and FreeBSD.  The default is not necessarily ideal; it might be
+        The default is <literal>fdatasync</literal>.
+        The default is not necessarily ideal; it might be
         necessary to change this setting or other aspects of your system
         configuration in order to create a crash-safe configuration or
         achieve optimal performance.
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e4c0269fa3..d466143e4a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -215,13 +215,8 @@
 					# unrecoverable data corruption)
 #synchronous_commit = on		# synchronization level;
 					# off, local, remote_write, remote_apply, or on
-#wal_sync_method = fsync		# the default is the first option
-					# supported by the operating system:
-					#   open_datasync
-					#   fdatasync (default on Linux and FreeBSD)
-					#   fsync
-					#   fsync_writethrough
-					#   open_sync
+#wal_sync_method = fdatasync		# fsync, fdatasync, fsync_writethrough,
+					# open_sync, open_datasync
 #full_page_writes = on			# recover from partial page writes
 #wal_log_hints = off			# also do full page writes of non-critical updates
 					# (change requires restart)
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 14fa4acae2..5b431d2f99 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -292,7 +292,7 @@ test_sync(int writes_per_op)
 		printf(_("\nCompare file sync methods using one %dkB write:\n"), XLOG_BLCKSZ_K);
 	else
 		printf(_("\nCompare file sync methods using two %dkB writes:\n"), XLOG_BLCKSZ_K);
-	printf(_("(in wal_sync_method preference order, except fdatasync is Linux's default)\n"));
+	printf(_("(fdatasync is the default)\n"));
 
 	/*
 	 * Test open_datasync if available
@@ -326,7 +326,7 @@ test_sync(int writes_per_op)
 #endif
 
 /*
- * Test fdatasync if available
+ * Test fdatasync
  */
 	printf(LABEL_FORMAT, "fdatasync");
 	fflush(stdout);
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index fe794c7740..a628976902 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -64,19 +64,7 @@ typedef uint32 TimeLineID;
  */
 typedef uint16 RepOriginId;
 
-/*
- * This chunk of hackery attempts to determine which file sync methods
- * are available on the current platform, and to choose an appropriate
- * default method.
- *
- * Note that we define our own O_DSYNC on Windows, but not O_SYNC.
- */
-#if defined(PLATFORM_DEFAULT_SYNC_METHOD)
-#define DEFAULT_SYNC_METHOD		PLATFORM_DEFAULT_SYNC_METHOD
-#elif defined(O_DSYNC) && (!defined(O_SYNC) || O_DSYNC != O_SYNC)
-#define DEFAULT_SYNC_METHOD		SYNC_METHOD_OPEN_DSYNC
-#else
+/* Default synchronization method for WAL. */
 #define DEFAULT_SYNC_METHOD		SYNC_METHOD_FDATASYNC
-#endif
 
 #endif							/* XLOG_DEFS_H */
diff --git a/src/include/port/freebsd.h b/src/include/port/freebsd.h
index 0e3fde55d6..2e36d3da4f 100644
--- a/src/include/port/freebsd.h
+++ b/src/include/port/freebsd.h
@@ -1,8 +1 @@
 /* src/include/port/freebsd.h */
-
-/*
- * Set the default wal_sync_method to fdatasync.  xlogdefs.h's normal rules
- * would prefer open_datasync on FreeBSD 13+, but that is not a good choice on
- * many systems.
- */
-#define PLATFORM_DEFAULT_SYNC_METHOD	SYNC_METHOD_FDATASYNC
diff --git a/src/include/port/linux.h b/src/include/port/linux.h
index 7a6e46cdbb..acd867606c 100644
--- a/src/include/port/linux.h
+++ b/src/include/port/linux.h
@@ -12,11 +12,3 @@
  * to have a kernel version test here.
  */
 #define HAVE_LINUX_EIDRM_BUG
-
-/*
- * Set the default wal_sync_method to fdatasync.  With recent Linux versions,
- * xlogdefs.h's normal rules will prefer open_datasync, which (a) doesn't
- * perform better and (b) causes outright failures on ext4 data=journal
- * filesystems, because those don't support O_DIRECT.
- */
-#define PLATFORM_DEFAULT_SYNC_METHOD	SYNC_METHOD_FDATASYNC
-- 
2.39.2

From 761792297fffbcabf356cdcb5471845845e54ca7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Fri, 14 Jul 2023 21:51:44 +1200
Subject: [PATCH 2/2] Remove fsync_writethrough, add fsync=full (macOS only).

Previously, wal_sync_method=fsync_writethrough affected pg_fsync() of
WAL, control, data and other files, by causing pg_fsync() to use the
fcntl(F_FULLFSYNC) call that Apple recommends for databases.

That's a little confusing, because the GUC purports to affect only WAL
files, and it was also not used by default, exposing users to data loss
risk.  New approach:

1.  wal_sync_method=fsync_writethrough is removed.  Instead, we will
make the existing fsync and fdatasync methods do what Apple recommends.

2.  Introduce a new setting fsync=full.  This refactoring reflects the
fact that all pg_fsync() calls are affected, not just those for WAL
files.  According to expectations established by other modern platforms,
fcntl(F_FULLFSYNC) is the "true" fsync operation on this platform, and
fsync=full will select that.  The traditional fsync() system call is
still available by setting fsync=on.

3.  Use fcntl(F_FULLSYNC) for pg_fdatasync() too, if fsync=full.  Apple
doesn't exactly implement fdatasync().  It's not documented, but exists
in some partially implemented form, without a declaration.  As far as we
can tell, it behaves just like fsync().  We'll still use it if you ask
for it with fsync=on.

4.  Use fsync=full by default on Macs, following Apple's recommendation
for applications such as databases.  It is not available on non-Macs, to
avoid confusing people on other operating systems.  Since
wal_sync_method's default was changed to fdatasync by an earlier commit,
the net effect is that Macs now use F_FULLSYNC for all file
synchronization (data, control, WAL, ...) by default.

While here, also get rid of configure probes for F_FULLFSYNC, and just
test for its existence with #ifdef.
---
 doc/src/sgml/config.sgml                      | 31 ++++++--
 doc/src/sgml/monitoring.sgml                  |  8 +-
 doc/src/sgml/wal.sgml                         | 14 +---
 meson.build                                   |  1 -
 src/backend/access/transam/xlog.c             | 18 +----
 src/backend/storage/file/fd.c                 | 74 +++++++------------
 src/backend/storage/sync/sync.c               |  2 +-
 src/backend/utils/init/globals.c              |  2 +-
 src/backend/utils/misc/guc_tables.c           | 41 +++++++---
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/pg_test_fsync/pg_test_fsync.c         | 33 ++-------
 src/include/access/xlog.h                     |  3 +-
 src/include/miscadmin.h                       | 15 +++-
 src/include/port/darwin.h                     |  5 --
 src/include/storage/fd.h                      |  2 -
 src/tools/msvc/Solution.pm                    |  1 -
 16 files changed, 118 insertions(+), 134 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index acb6666e0b..b34d3901f2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2898,7 +2898,7 @@ include_dir 'conf.d'
      </varlistentry>
 
      <varlistentry id="guc-fsync" xreflabel="fsync">
-      <term><varname>fsync</varname> (<type>boolean</type>)
+      <term><varname>fsync</varname> (<type>enum</type>)
       <indexterm>
        <primary><varname>fsync</varname> configuration parameter</primary>
       </indexterm>
@@ -2913,6 +2913,22 @@ include_dir 'conf.d'
         consistent state after an operating system or hardware crash.
        </para>
 
+       <para>
+        On macOS only, the parameter can also be set to
+        <literal>full</literal>, and that is the default.
+        This value causes <productname>PostgreSQL</productname> to replace all
+        calls to <function>fsync()</function> with
+        <function>fcntl(fd, F_FULLFSYNC)</function>, as recommended by
+        Apple's documentation.  It also does the same for calls to
+        <function>fdatasync()</function>, a function that exists but is not
+        documented by Apple.
+        Setting it to <literal>on</literal> causes macOS's regular
+        <function>fsync()</function> and <function>fdatasync()</function>
+        functions to be used as on other platforms.  They are known to be
+        considerably faster, but users should consult the warnings in Apple's
+        manual page for <function>fsync</function> first.
+       </para>
+
        <para>
         While turning off <varname>fsync</varname> is often a performance
         benefit, this can result in unrecoverable data corruption in
@@ -3144,11 +3160,6 @@ include_dir 'conf.d'
         </para>
         </listitem>
         <listitem>
-        <para>
-         <literal>fsync_writethrough</literal> (call <function>fsync()</function> at each commit, forcing write-through of any disk write cache)
-        </para>
-        </listitem>
-        <listitem>
         <para>
          <literal>open_sync</literal> (write WAL files with <function>open()</function> option <symbol>O_SYNC</symbol>)
         </para>
@@ -3165,6 +3176,14 @@ include_dir 'conf.d'
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.
        </para>
+       <para>
+        Note that on macOS, the <xref linkend="guc-fsync"/> setting can modify
+        the behavior of the <literal>fdatasync</literal> and
+        <literal>fsync</literal> levels, since
+        it can cause all calls to <function>fdatasync()</function> and
+        <function>fsync()</function> to be replaced with an Apple-specific
+        <function>fcntl()</function> call.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 588b720f57..87dd026bc9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3055,8 +3055,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
        <function>issue_xlog_fsync</function> request
        (if <xref linkend="guc-fsync"/> is <literal>on</literal> and
        <xref linkend="guc-wal-sync-method"/> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
+       <literal>fdatasync</literal> or <literal>fsync</literal>,
+       otherwise zero).
        See <xref linkend="wal-configuration"/> for more information about
        the internal WAL function <function>issue_xlog_fsync</function>.
       </para></entry>
@@ -3086,8 +3086,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
        (if <varname>track_wal_io_timing</varname> is enabled,
        <varname>fsync</varname> is <literal>on</literal>, and
        <varname>wal_sync_method</varname> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
+       <literal>fdatasync</literal> or <literal>fsync</literal>,
+       otherwise zero).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4aad0e1a07..342fc4c6c8 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -114,12 +114,6 @@
       </para>
     </listitem>
 
-    <listitem>
-      <para>
-        On <productname>macOS</productname>, write caching can be prevented by
-        setting <varname>wal_sync_method</varname> to <literal>fsync_writethrough</literal>.
-      </para>
-    </listitem>
   </itemizedlist>
 
   <para>
@@ -758,10 +752,6 @@
    The <xref linkend="guc-wal-sync-method"/> parameter determines how
    <productname>PostgreSQL</productname> will ask the kernel to force
    <acronym>WAL</acronym> updates out to disk.
-   All the options should be the same in terms of reliability, with
-   the exception of <literal>fsync_writethrough</literal>, which can sometimes
-   force a flush of the disk cache even when other options do not do so.
-   However, it's quite platform-specific which one will be the fastest.
    You can test the speeds of different options using the <xref
    linkend="pgtestfsync"/> program.
    Note that this parameter is irrelevant if <varname>fsync</varname>
@@ -795,8 +785,8 @@
    <literal>open_datasync</literal> or <literal>open_sync</literal>,
    a write operation in <function>XLogWrite</function> guarantees to sync written
    WAL data to disk and <function>issue_xlog_fsync</function> does nothing.
-   If <varname>wal_sync_method</varname> is either <literal>fdatasync</literal>,
-   <literal>fsync</literal>, or <literal>fsync_writethrough</literal>,
+   If <varname>wal_sync_method</varname> is either <literal>fdatasync</literal>
+   or <literal>fsync</literal>,
    the write operation moves WAL buffers to kernel cache and
    <function>issue_xlog_fsync</function> syncs them to disk. Regardless
    of the setting of <varname>track_wal_io_timing</varname>, the number
diff --git a/meson.build b/meson.build
index 04ea348852..c37332b4d5 100644
--- a/meson.build
+++ b/meson.build
@@ -2163,7 +2163,6 @@ endforeach
 
 
 decl_checks = [
-  ['F_FULLFSYNC', 'fcntl.h'],
   ['fdatasync', 'unistd.h'],
   ['posix_fadvise', 'fcntl.h'],
   ['strlcat', 'string.h'],
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8b0710abe6..d7241766de 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -173,9 +173,6 @@ static bool check_wal_consistency_checking_deferred = false;
  */
 const struct config_enum_entry sync_method_options[] = {
 	{"fsync", SYNC_METHOD_FSYNC, false},
-#ifdef HAVE_FSYNC_WRITETHROUGH
-	{"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false},
-#endif
 	{"fdatasync", SYNC_METHOD_FDATASYNC, false},
 #ifdef O_SYNC
 	{"open_sync", SYNC_METHOD_OPEN, false},
@@ -2613,7 +2610,7 @@ XLogFlush(XLogRecPtr record)
 		 * We do not sleep if enableFsync is not turned on, nor if there are
 		 * fewer than CommitSiblings other backends with active transactions.
 		 */
-		if (CommitDelay > 0 && enableFsync &&
+		if (CommitDelay > 0 && enableFsync != ENABLE_FSYNC_OFF &&
 			MinimumActiveBackends(CommitSiblings))
 		{
 			pg_usleep(CommitDelay);
@@ -8084,7 +8081,7 @@ get_sync_bit(int method)
 		o_direct_flag = PG_O_DIRECT;
 
 	/* If fsync is disabled, never open in sync mode */
-	if (!enableFsync)
+	if (enableFsync == ENABLE_FSYNC_OFF)
 		return o_direct_flag;
 
 	switch (method)
@@ -8096,7 +8093,6 @@ get_sync_bit(int method)
 			 * be seen here.
 			 */
 		case SYNC_METHOD_FSYNC:
-		case SYNC_METHOD_FSYNC_WRITETHROUGH:
 		case SYNC_METHOD_FDATASYNC:
 			return o_direct_flag;
 #ifdef O_SYNC
@@ -8171,7 +8167,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	 * Quick exit if fsync is disabled or write() has already synced the WAL
 	 * file.
 	 */
-	if (!enableFsync ||
+	if (enableFsync == ENABLE_FSYNC_OFF ||
 		sync_method == SYNC_METHOD_OPEN ||
 		sync_method == SYNC_METHOD_OPEN_DSYNC)
 		return;
@@ -8186,15 +8182,9 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	switch (sync_method)
 	{
 		case SYNC_METHOD_FSYNC:
-			if (pg_fsync_no_writethrough(fd) != 0)
+			if (pg_fsync(fd) != 0)
 				msg = _("could not fsync file \"%s\": %m");
 			break;
-#ifdef HAVE_FSYNC_WRITETHROUGH
-		case SYNC_METHOD_FSYNC_WRITETHROUGH:
-			if (pg_fsync_writethrough(fd) != 0)
-				msg = _("could not fsync write-through file \"%s\": %m");
-			break;
-#endif
 		case SYNC_METHOD_FDATASYNC:
 			if (pg_fdatasync(fd) != 0)
 				msg = _("could not fdatasync file \"%s\": %m");
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index a027a8aabc..f8df7dbeae 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -355,11 +355,12 @@ static int	fsync_parent_path(const char *fname, int elevel);
 
 
 /*
- * pg_fsync --- do fsync with or without writethrough
+ * pg_fsync --- do fsync, if enabled, or redirect to Apple's F_FULLFSYNC.
  */
 int
 pg_fsync(int fd)
 {
+	int			rc;
 #if !defined(WIN32) && defined(USE_ASSERT_CHECKING)
 	struct stat st;
 
@@ -398,30 +399,20 @@ pg_fsync(int fd)
 	errno = 0;
 #endif
 
-	/* #if is to skip the sync_method test if there's no need for it */
-#if defined(HAVE_FSYNC_WRITETHROUGH)
-	if (sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH)
-		return pg_fsync_writethrough(fd);
-	else
-#endif
-		return pg_fsync_no_writethrough(fd);
-}
-
-
-/*
- * pg_fsync_no_writethrough --- same as fsync except does nothing if
- *	enableFsync is off
- */
-int
-pg_fsync_no_writethrough(int fd)
-{
-	int			rc;
-
-	if (!enableFsync)
+	if (enableFsync == ENABLE_FSYNC_OFF)
 		return 0;
 
 retry:
-	rc = fsync(fd);
+#if defined(ENABLE_FSYNC_FULL)
+	if (enableFsync == ENABLE_FSYNC_FULL)
+	{
+		rc = fcntl(fd, F_FULLFSYNC);
+	}
+	else
+#endif
+	{
+		rc = fsync(fd);
+	}
 
 	if (rc == -1 && errno == EINTR)
 		goto retry;
@@ -430,37 +421,28 @@ retry:
 }
 
 /*
- * pg_fsync_writethrough
- */
-int
-pg_fsync_writethrough(int fd)
-{
-	if (enableFsync)
-	{
-#if defined(F_FULLFSYNC)
-		return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0;
-#else
-		errno = ENOSYS;
-		return -1;
-#endif
-	}
-	else
-		return 0;
-}
-
-/*
- * pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off
+ * pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off,
+ * and redirects to Apple's F_FULLSYNC if configured.
  */
 int
 pg_fdatasync(int fd)
 {
 	int			rc;
 
-	if (!enableFsync)
+	if (enableFsync == ENABLE_FSYNC_OFF)
 		return 0;
 
 retry:
-	rc = fdatasync(fd);
+#if defined(ENABLE_FSYNC_FULL)
+	if (enableFsync == ENABLE_FSYNC_FULL)
+	{
+		rc = fcntl(fd, F_FULLFSYNC);
+	}
+	else
+#endif
+	{
+		rc = fdatasync(fd);
+	}
 
 	if (rc == -1 && errno == EINTR)
 		goto retry;
@@ -482,7 +464,7 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
 	 * if fsyncs are disabled - that's a decision we might want to make
 	 * configurable at some point.
 	 */
-	if (!enableFsync)
+	if (enableFsync == ENABLE_FSYNC_OFF)
 		return;
 
 	/*
@@ -3491,7 +3473,7 @@ SyncDataDirectory(void)
 	bool		xlog_is_symlink;
 
 	/* We can skip this whole thing if fsync is disabled. */
-	if (!enableFsync)
+	if (enableFsync == ENABLE_FSYNC_OFF)
 		return;
 
 	/*
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index 04fcb06056..1f8d368e08 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -386,7 +386,7 @@ ProcessSyncRequests(void)
 		 * all.  (We delay checking until this point so that changing fsync on
 		 * the fly behaves sensibly.)
 		 */
-		if (enableFsync)
+		if (enableFsync != ENABLE_FSYNC_OFF)
 		{
 			/*
 			 * If in checkpointer, we want to absorb pending requests every so
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 011ec18015..78d5ef6a63 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -120,7 +120,7 @@ int			DateStyle = USE_ISO_DATES;
 int			DateOrder = DATEORDER_MDY;
 int			IntervalStyle = INTSTYLE_POSTGRES;
 
-bool		enableFsync = true;
+int			enableFsync = DEFAULT_ENABLE_FSYNC;
 bool		allowSystemTableMods = false;
 int			work_mem = 4096;
 double		hash_mem_multiplier = 2.0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 93dc2e7680..df44f853d4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -477,6 +477,22 @@ static const struct config_enum_entry wal_compression_options[] = {
 	{NULL, 0, false}
 };
 
+static const struct config_enum_entry enable_fsync_options[] = {
+	{"on", ENABLE_FSYNC_ON, false},
+	{"off", ENABLE_FSYNC_OFF, false},
+	{"true", ENABLE_FSYNC_ON, true},
+	{"false", ENABLE_FSYNC_OFF, true},
+	{"yes", ENABLE_FSYNC_ON, true},
+	{"no", ENABLE_FSYNC_OFF, true},
+	{"1", ENABLE_FSYNC_ON, true},
+	{"0", ENABLE_FSYNC_OFF, true},
+#ifdef ENABLE_FSYNC_FULL
+	{"full", ENABLE_FSYNC_FULL, false},
+#endif
+	{NULL, 0, false}
+};
+
+
 /*
  * Options for enum values stored in other modules
  */
@@ -1095,18 +1111,6 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
-	{
-		{"fsync", PGC_SIGHUP, WAL_SETTINGS,
-			gettext_noop("Forces synchronization of updates to disk."),
-			gettext_noop("The server will use the fsync() system call in several places to make "
-						 "sure that updates are physically written to disk. This insures "
-						 "that a database cluster will recover to a consistent state after "
-						 "an operating system or hardware crash.")
-		},
-		&enableFsync,
-		true,
-		NULL, NULL, NULL
-	},
 	{
 		{"ignore_checksum_failure", PGC_SUSET, DEVELOPER_OPTIONS,
 			gettext_noop("Continues processing after a checksum failure."),
@@ -4643,6 +4647,19 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"fsync", PGC_SIGHUP, WAL_SETTINGS,
+			gettext_noop("Forces synchronization of updates to disk."),
+			gettext_noop("Whether to use syncronization APIs to make "
+						 "sure that updates are physically written to disk. This insures "
+						 "that a database cluster will recover to a consistent state after "
+						 "an operating system or hardware crash.")
+		},
+		&enableFsync,
+		DEFAULT_ENABLE_FSYNC, enable_fsync_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"transaction_isolation", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the current transaction's isolation level."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d466143e4a..f2ec3b68dd 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -215,7 +215,7 @@
 					# unrecoverable data corruption)
 #synchronous_commit = on		# synchronization level;
 					# off, local, remote_write, remote_apply, or on
-#wal_sync_method = fdatasync		# fsync, fdatasync, fsync_writethrough,
+#wal_sync_method = fdatasync		# fsync, fdatasync,
 					# open_sync, open_datasync
 #full_page_writes = on			# recover from partial page writes
 #wal_log_hints = off			# also do full page writes of non-critical updates
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 5b431d2f99..35bc604cf1 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -87,9 +87,6 @@ static DWORD WINAPI process_alarm(LPVOID param);
 #endif
 static void signal_cleanup(SIGNAL_ARGS);
 
-#ifdef HAVE_FSYNC_WRITETHROUGH
-static int	pg_fsync_writethrough(int fd);
-#endif
 static void print_elapse(struct timeval start_t, struct timeval stop_t, int ops);
 
 #define die(msg) pg_fatal("%s: %m", _(msg))
@@ -292,7 +289,7 @@ test_sync(int writes_per_op)
 		printf(_("\nCompare file sync methods using one %dkB write:\n"), XLOG_BLCKSZ_K);
 	else
 		printf(_("\nCompare file sync methods using two %dkB writes:\n"), XLOG_BLCKSZ_K);
-	printf(_("(fdatasync is the default)\n"));
+	printf(_("(fdatasync is the default, but see docs for macOS)\n"));
 
 	/*
 	 * Test open_datasync if available
@@ -371,12 +368,13 @@ test_sync(int writes_per_op)
 	close(tmpfile);
 
 /*
- * If fsync_writethrough is available, test as well
+ * Test the macOS fcntl that use instead of fsync/fdatasync levels if
+ * fsync=full.
  */
-	printf(LABEL_FORMAT, "fsync_writethrough");
+	printf(LABEL_FORMAT, "fcntl(F_FULLFSYNC)");
 	fflush(stdout);
 
-#ifdef HAVE_FSYNC_WRITETHROUGH
+#ifdef F_FULLFSYNC
 	if ((tmpfile = open(filename, O_RDWR | PG_BINARY, 0)) == -1)
 		die("could not open output file");
 	START_TIMER;
@@ -388,8 +386,8 @@ test_sync(int writes_per_op)
 						  XLOG_BLCKSZ,
 						  writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
 				die("write failed");
-		if (pg_fsync_writethrough(tmpfile) != 0)
-			die("fsync failed");
+		if (fcntl(tmpfile, F_FULLFSYNC) != 0)
+			die("fcntl failed");
 	}
 	STOP_TIMER;
 	close(tmpfile);
@@ -504,8 +502,7 @@ test_file_descriptor_sync(void)
 	/*
 	 * Test whether fsync can sync data written on a different descriptor for
 	 * the same file.  This checks the efficiency of multi-process fsyncs
-	 * against the same file. Possibly this should be done with writethrough
-	 * on platforms which support it.
+	 * against the same file.
 	 */
 	printf(_("\nTest if fsync on non-write file descriptor is honored:\n"));
 	printf(_("(If the times are similar, fsync() can sync data written on a different\n"
@@ -600,20 +597,6 @@ signal_cleanup(SIGNAL_ARGS)
 	exit(1);
 }
 
-#ifdef HAVE_FSYNC_WRITETHROUGH
-
-static int
-pg_fsync_writethrough(int fd)
-{
-#if defined(F_FULLFSYNC)
-	return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0;
-#else
-	errno = ENOSYS;
-	return -1;
-#endif
-}
-#endif
-
 /*
  * print out the writes per second for tests
  */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 48ca852381..0386c2c7a3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -22,8 +22,7 @@
 #define SYNC_METHOD_FSYNC		0
 #define SYNC_METHOD_FDATASYNC	1
 #define SYNC_METHOD_OPEN		2	/* for O_SYNC */
-#define SYNC_METHOD_FSYNC_WRITETHROUGH	3
-#define SYNC_METHOD_OPEN_DSYNC	4	/* for O_DSYNC */
+#define SYNC_METHOD_OPEN_DSYNC	3	/* for O_DSYNC */
 extern PGDLLIMPORT int sync_method;
 
 extern PGDLLIMPORT XLogRecPtr ProcLastRecPtr;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 14bd574fc2..26d2ad38b7 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -23,6 +23,7 @@
 #ifndef MISCADMIN_H
 #define MISCADMIN_H
 
+#include <fcntl.h>
 #include <signal.h>
 
 #include "datatype/timestamp.h" /* for TimestampTz */
@@ -256,7 +257,19 @@ extern PGDLLIMPORT int IntervalStyle;
 
 #define MAXTZLEN		10		/* max TZ name len, not counting tr. null */
 
-extern PGDLLIMPORT bool enableFsync;
+#define ENABLE_FSYNC_OFF			0
+#define ENABLE_FSYNC_ON				1
+#ifdef F_FULLFSYNC
+#define ENABLE_FSYNC_FULL			2
+#endif
+
+#ifdef ENABLE_FSYNC_FULL
+#define DEFAULT_ENABLE_FSYNC ENABLE_FSYNC_FULL
+#else
+#define DEFAULT_ENABLE_FSYNC ENABLE_FSYNC_ON
+#endif
+
+extern PGDLLIMPORT int enableFsync;
 extern PGDLLIMPORT bool allowSystemTableMods;
 extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
diff --git a/src/include/port/darwin.h b/src/include/port/darwin.h
index 15fb69d6db..41e2a427d3 100644
--- a/src/include/port/darwin.h
+++ b/src/include/port/darwin.h
@@ -1,8 +1,3 @@
 /* src/include/port/darwin.h */
 
 #define __darwin__	1
-
-#if HAVE_DECL_F_FULLFSYNC		/* not present before macOS 10.3 */
-#define HAVE_FSYNC_WRITETHROUGH
-
-#endif
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 6791a406fc..ed49795f97 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -183,8 +183,6 @@ extern void RemovePgTempFilesInDir(const char *tmpdirname, bool missing_ok,
 extern bool looks_like_temp_rel_name(const char *name);
 
 extern int	pg_fsync(int fd);
-extern int	pg_fsync_no_writethrough(int fd);
-extern int	pg_fsync_writethrough(int fd);
 extern int	pg_fdatasync(int fd);
 extern void pg_flush_data(int fd, off_t offset, off_t nbytes);
 extern int	pg_truncate(const char *path, off_t length);
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 1cbc857e35..ee963e8351 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -232,7 +232,6 @@ sub GenerateFiles
 		HAVE_CRTDEFS_H => undef,
 		HAVE_CRYPTO_LOCK => undef,
 		HAVE_DECL_FDATASYNC => 0,
-		HAVE_DECL_F_FULLFSYNC => 0,
 		HAVE_DECL_LLVMCREATEGDBREGISTRATIONLISTENER => 0,
 		HAVE_DECL_LLVMCREATEPERFJITEVENTLISTENER => 0,
 		HAVE_DECL_LLVMGETHOSTCPUNAME => 0,
-- 
2.39.2

Reply via email to