On 18.08.24 15:35, Peter Eisentraut wrote:
On 17.08.24 00:01, Thomas Munro wrote:
Hmm. fd.h and fd.c test for F_NOCACHE, which is pretty closely
related. Now I'm wondering why we actually need this in
pg_config_manual.h at all. Who would turn it off at compile time, and
why would they not be satisfied with setting relevant GUCs to 0? Can
we just teach fd.h to define USE_PREFETCH if
defined(POSIX_FADV_WILLNEED) || defined(F_RDADVISE)?
I thought USE_PREFETCH existed so that we don't have the run-time
overhead for all the bookkeeping code if we don't have any OS-level
prefetch support at the end. But it looks like most of that bookkeeping
code is skipped anyway if the *_io_concurrency settings are at 0. So
yes, getting rid of USE_PREFETCH globally would be useful.
(I have also thought multiple times about removing the configure
probes for F_FULLFSYNC, and just doing #ifdef. Oh, that's in my patch
for CF #4453.)
Understandable, but we should be careful here that we don't create
setups that can cause bugs like
<https://www.postgresql.org/message-id/48da4a1f-ccd9-4988-9622-24f37b1de...@eisentraut.org>.
I think that's fine. I don't really like the word "prefetch", could
mean many different things. What about "requires OS support for
issuing read-ahead advice", which uses a word that appears in both of
those interfaces?
I like that term.
Here is another patch, with the documentation wording adjusted like
this, and a bit of other tidying.
I opted against pursuing some of the other refactoring ideas as part of
this. Removing USE_PREFETCH seems worthwhile, but has some open
questions. For example, pg_prewarm has a prefetch mode, which currently
errors if there is no prefetch support. So we'd still require some
knowledge outside of fd.c, unless we want to change that behavior. The
idea of creating a src/port/posix_fadvise.c also got a bit too
complicated in terms of how to weave that into configure and meson.
There are apparently various old problems, like old Linux systems with
incompatible declarations, or something like that, and then the special
casing of Solaris (which isn't even in meson.build). If we could get
rid of some of that, then it would be easier to add new behavior there,
otherwise it's just likely to break things. So I'm leaving these as
separate projects.
In terms of $subject, this patch seems sufficient for now.
From 60e86ec3b41ad0aef7e9aba4cfc0688604702047 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Fri, 23 Aug 2024 13:49:55 +0200
Subject: [PATCH v3] Add prefetching support on macOS
macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE
command does the same thing.
Some related documentation has been generalized to not mention
posix_advise() specifically anymore.
Discussion:
https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org
---
doc/src/sgml/config.sgml | 14 +++-----
doc/src/sgml/wal.sgml | 4 +--
src/backend/commands/variable.c | 4 +--
src/backend/storage/file/fd.c | 59 ++++++++++++++++++++++-----------
src/include/pg_config_manual.h | 5 ++-
src/include/port/darwin.h | 5 +++
6 files changed, 56 insertions(+), 35 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2937384b001..12feac60874 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2679,11 +2679,9 @@ <title>Asynchronous Behavior</title>
</para>
<para>
- Asynchronous I/O depends on an effective
<function>posix_fadvise</function>
- function, which some operating systems lack. If the function is not
- present then setting this parameter to anything but zero will result
- in an error. On some operating systems (e.g., Solaris), the function
- is present but does not actually do anything.
+ Asynchronous I/O requires that the operating system supports issuing
+ read-ahead advice. If there is no operating system support then
+ setting this parameter to anything but zero will result in an error.
</para>
<para>
@@ -3852,10 +3850,8 @@ <title>Recovery</title>
<literal>off</literal>, <literal>on</literal> and
<literal>try</literal> (the default). The setting
<literal>try</literal> enables
- prefetching only if the operating system provides the
- <function>posix_fadvise</function> function, which is currently used
- to implement prefetching. Note that some operating systems provide the
- function, but it doesn't do anything.
+ prefetching only if the operating system provides support for issuing
+ read-ahead advice.
</para>
<para>
Prefetching blocks that will soon be needed can reduce I/O wait times
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d5df65bc693..0ba0c930b78 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -841,8 +841,8 @@ <title><acronym>WAL</acronym> Configuration</title>
The <xref linkend="guc-maintenance-io-concurrency"/> and
<xref linkend="guc-wal-decode-buffer-size"/> settings limit prefetching
concurrency and distance, respectively. By default, it is set to
- <literal>try</literal>, which enables the feature on systems where
- <function>posix_fadvise</function> is available.
+ <literal>try</literal>, which enables the feature on systems that support
+ issuing read-ahead advice.
</para>
</sect1>
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 6202c5ebe44..136c584305a 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1212,7 +1212,7 @@ check_effective_io_concurrency(int *newval, void **extra,
GucSource source)
#ifndef USE_PREFETCH
if (*newval != 0)
{
- GUC_check_errdetail("\"effective_io_concurrency\" must be set
to 0 on platforms that lack posix_fadvise().");
+ GUC_check_errdetail("\"effective_io_concurrency\" must be set
to 0 on platforms that lack support for issuing read-ahead advice.");
return false;
}
#endif /* USE_PREFETCH */
@@ -1225,7 +1225,7 @@ check_maintenance_io_concurrency(int *newval, void
**extra, GucSource source)
#ifndef USE_PREFETCH
if (*newval != 0)
{
- GUC_check_errdetail("\"maintenance_io_concurrency\" must be set
to 0 on platforms that lack posix_fadvise().");
+ GUC_check_errdetail("\"maintenance_io_concurrency\" must be set
to 0 on platforms that lack support for issuing read-ahead advice.");
return false;
}
#endif /* USE_PREFETCH */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 3944321ff37..2830b310e0b 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2068,40 +2068,61 @@ FileClose(File file)
/*
* FilePrefetch - initiate asynchronous read of a given range of the file.
*
- * Currently the only implementation of this function is using posix_fadvise
- * which is the simplest standardized interface that accomplishes this.
- * We could add an implementation using libaio in the future; but note that
- * this API is inappropriate for libaio, which wants to have a buffer provided
- * to read into.
+ * Returns 0 on success, otherwise an errno error code (like posix_fadvise()).
+ *
+ * posix_fadvise() is the simplest standardized interface that accomplishes
+ * this. We could add an implementation using libaio in the future; but note
+ * that this API is inappropriate for libaio, which wants to have a buffer
+ * provided to read into.
*/
int
FilePrefetch(File file, off_t offset, off_t amount, uint32 wait_event_info)
{
-#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
- int returnCode;
-
Assert(FileIsValid(file));
DO_DB(elog(LOG, "FilePrefetch: %d (%s) " INT64_FORMAT " " INT64_FORMAT,
file, VfdCache[file].fileName,
(int64) offset, (int64) amount));
- returnCode = FileAccess(file);
- if (returnCode < 0)
- return returnCode;
+#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
+ {
+ int returnCode;
+
+ returnCode = FileAccess(file);
+ if (returnCode < 0)
+ return returnCode;
retry:
- pgstat_report_wait_start(wait_event_info);
- returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
- POSIX_FADV_WILLNEED);
- pgstat_report_wait_end();
+ pgstat_report_wait_start(wait_event_info);
+ returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
+
POSIX_FADV_WILLNEED);
+ pgstat_report_wait_end();
- if (returnCode == EINTR)
- goto retry;
+ if (returnCode == EINTR)
+ goto retry;
- return returnCode;
+ return returnCode;
+ }
+#elif defined(__darwin__)
+ {
+ struct radvisory
+ {
+ off_t ra_offset; /* offset into the file
*/
+ int ra_count; /* size of the
read */
+ } ra;
+ int returnCode;
+
+ ra.ra_offset = offset;
+ ra.ra_count = amount;
+ pgstat_report_wait_start(wait_event_info);
+ returnCode = fcntl(VfdCache[file].fd, F_RDADVISE, &ra);
+ pgstat_report_wait_end();
+ if (returnCode != -1)
+ return 0;
+ else
+ return errno;
+ }
#else
- Assert(FileIsValid(file));
return 0;
#endif
}
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index e799c2989b8..e49eb13e43c 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -139,9 +139,8 @@
/*
* USE_PREFETCH code should be compiled only if we have a way to implement
* prefetching. (This is decoupled from USE_POSIX_FADVISE because there
- * might in future be support for alternative low-level prefetch APIs.
- * If you change this, you probably need to adjust the error message in
- * check_effective_io_concurrency.)
+ * might in future be support for alternative low-level prefetch APIs,
+ * as well as platform-specific APIs defined elsewhere.)
*/
#ifdef USE_POSIX_FADVISE
#define USE_PREFETCH
diff --git a/src/include/port/darwin.h b/src/include/port/darwin.h
index 15fb69d6dbb..6aa2ea70f6b 100644
--- a/src/include/port/darwin.h
+++ b/src/include/port/darwin.h
@@ -6,3 +6,8 @@
#define HAVE_FSYNC_WRITETHROUGH
#endif
+
+/*
+ * macOS has a platform-specific implementation of prefetching.
+ */
+#define USE_PREFETCH
base-commit: 94a3373ac5c3d2444b2379a3c185b986627c42d4
--
2.46.0