On 14.08.24 16:39, Peter Eisentraut wrote:
On 14.08.24 14:36, Thomas Munro wrote:
On Wed, Aug 14, 2024 at 7:04 PM Peter Eisentraut
<pe...@eisentraut.org> wrote:
Attached is a patch to implement this. It seems to work, but of course
it's kind of hard to tell whether it actually does anything useful.
Header order problem: pg_config_os.h defines __darwin__, but
pg_config_manual.h is included first, and tests __darwin__. I hacked
my way around that, and then made a table of 40,000,000 integers in a
2GB buffer pool. I used "select count(pg_buffercache_evict(buffered))
from pg_buffer_cache", and "sudo purge", to clear the two layers of
cache for each test, and then measured:
maintenance_io_concurrency=0, ANALYZE: 2311ms
maintenance_io_concurrency=10, ANALYZE: 652ms
maintenance_io_concurrency=25, ANALYZE: 389ms
It works!
Cool! I'll work on a more polished patch.
Here it is.
Some interesting questions:
What to do about the order of the symbols and include files. I threw
something into src/include/port/darwin.h, but I'm not sure if that's
good. Alternatively, we could not use __darwin__ but instead the more
standard and predefined defined(__APPLE__) && defined(__MACH__).
How to document it. The current documentation makes references mainly
to the availability of posix_fadvise(). That seems quite low-level.
How could a user of a prepared package even find out about that? Should
we just say "requires OS support" (kind of like I did here) and you can
query the effective state by looking at the *_io_concurrency settings?
Or do we need a read-only parameter that shows whether prefetch support
exists (kind of along the lines of huge pages)?
Btw., for context, here is what I gather the prefetch support (with this
patch) is:
cygwin posix_fadvise
darwin fcntl
freebsd posix_fadvise
linux posix_fadvise
netbsd posix_fadvise
openbsd no
solaris fake
win32 no
(There is also the possibility that we provide an implementation of
posix_fadvise() for macOS that wraps the platform-specific code in this
patch. And then Apple could just take that. ;-) )
From d318c4108b9e05a4828f6e7f71af34c3ca89b3ed Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Fri, 16 Aug 2024 20:15:43 +0200
Subject: [PATCH v2] Add prefetching support on macOS
macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE
command does the same thing.
Discussion:
https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org
---
doc/src/sgml/config.sgml | 14 +++-----
doc/src/sgml/wal.sgml | 4 +--
src/backend/commands/variable.c | 4 +--
src/backend/storage/file/fd.c | 59 ++++++++++++++++++++++-----------
src/include/pg_config_manual.h | 7 ++--
src/include/port/darwin.h | 5 +++
6 files changed, 57 insertions(+), 36 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2937384b001..c6d2fa2148e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2679,11 +2679,10 @@ <title>Asynchronous Behavior</title>
</para>
<para>
- Asynchronous I/O depends on an effective
<function>posix_fadvise</function>
- function, which some operating systems lack. If the function is not
- present then setting this parameter to anything but zero will result
- in an error. On some operating systems (e.g., Solaris), the function
- is present but does not actually do anything.
+ Asynchronous I/O depends on an effective support by the operating
+ system, which some operating systems lack. If there is no operating
+ system support then setting this parameter to anything but zero will
+ result in an error.
</para>
<para>
@@ -3852,10 +3851,7 @@ <title>Recovery</title>
<literal>off</literal>, <literal>on</literal> and
<literal>try</literal> (the default). The setting
<literal>try</literal> enables
- prefetching only if the operating system provides the
- <function>posix_fadvise</function> function, which is currently used
- to implement prefetching. Note that some operating systems provide the
- function, but it doesn't do anything.
+ prefetching only if the operating system provides prefetching support.
</para>
<para>
Prefetching blocks that will soon be needed can reduce I/O wait times
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d5df65bc693..72b73dbf113 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -841,8 +841,8 @@ <title><acronym>WAL</acronym> Configuration</title>
The <xref linkend="guc-maintenance-io-concurrency"/> and
<xref linkend="guc-wal-decode-buffer-size"/> settings limit prefetching
concurrency and distance, respectively. By default, it is set to
- <literal>try</literal>, which enables the feature on systems where
- <function>posix_fadvise</function> is available.
+ <literal>try</literal>, which enables the feature on systems that have
+ prefetching support.
</para>
</sect1>
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 6202c5ebe44..c1c6c2811c9 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1212,7 +1212,7 @@ check_effective_io_concurrency(int *newval, void **extra,
GucSource source)
#ifndef USE_PREFETCH
if (*newval != 0)
{
- GUC_check_errdetail("\"effective_io_concurrency\" must be set
to 0 on platforms that lack posix_fadvise().");
+ GUC_check_errdetail("\"effective_io_concurrency\" must be set
to 0 on platforms that lack prefetching support.");
return false;
}
#endif /* USE_PREFETCH */
@@ -1225,7 +1225,7 @@ check_maintenance_io_concurrency(int *newval, void
**extra, GucSource source)
#ifndef USE_PREFETCH
if (*newval != 0)
{
- GUC_check_errdetail("\"maintenance_io_concurrency\" must be set
to 0 on platforms that lack posix_fadvise().");
+ GUC_check_errdetail("\"maintenance_io_concurrency\" must be set
to 0 on platforms that lack prefetching support.");
return false;
}
#endif /* USE_PREFETCH */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 3944321ff37..2830b310e0b 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2068,40 +2068,61 @@ FileClose(File file)
/*
* FilePrefetch - initiate asynchronous read of a given range of the file.
*
- * Currently the only implementation of this function is using posix_fadvise
- * which is the simplest standardized interface that accomplishes this.
- * We could add an implementation using libaio in the future; but note that
- * this API is inappropriate for libaio, which wants to have a buffer provided
- * to read into.
+ * Returns 0 on success, otherwise an errno error code (like posix_fadvise()).
+ *
+ * posix_fadvise() is the simplest standardized interface that accomplishes
+ * this. We could add an implementation using libaio in the future; but note
+ * that this API is inappropriate for libaio, which wants to have a buffer
+ * provided to read into.
*/
int
FilePrefetch(File file, off_t offset, off_t amount, uint32 wait_event_info)
{
-#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
- int returnCode;
-
Assert(FileIsValid(file));
DO_DB(elog(LOG, "FilePrefetch: %d (%s) " INT64_FORMAT " " INT64_FORMAT,
file, VfdCache[file].fileName,
(int64) offset, (int64) amount));
- returnCode = FileAccess(file);
- if (returnCode < 0)
- return returnCode;
+#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
+ {
+ int returnCode;
+
+ returnCode = FileAccess(file);
+ if (returnCode < 0)
+ return returnCode;
retry:
- pgstat_report_wait_start(wait_event_info);
- returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
- POSIX_FADV_WILLNEED);
- pgstat_report_wait_end();
+ pgstat_report_wait_start(wait_event_info);
+ returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
+
POSIX_FADV_WILLNEED);
+ pgstat_report_wait_end();
- if (returnCode == EINTR)
- goto retry;
+ if (returnCode == EINTR)
+ goto retry;
- return returnCode;
+ return returnCode;
+ }
+#elif defined(__darwin__)
+ {
+ struct radvisory
+ {
+ off_t ra_offset; /* offset into the file
*/
+ int ra_count; /* size of the
read */
+ } ra;
+ int returnCode;
+
+ ra.ra_offset = offset;
+ ra.ra_count = amount;
+ pgstat_report_wait_start(wait_event_info);
+ returnCode = fcntl(VfdCache[file].fd, F_RDADVISE, &ra);
+ pgstat_report_wait_end();
+ if (returnCode != -1)
+ return 0;
+ else
+ return errno;
+ }
#else
- Assert(FileIsValid(file));
return 0;
#endif
}
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index e799c2989b8..d603b87afd3 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -139,11 +139,10 @@
/*
* USE_PREFETCH code should be compiled only if we have a way to implement
* prefetching. (This is decoupled from USE_POSIX_FADVISE because there
- * might in future be support for alternative low-level prefetch APIs.
- * If you change this, you probably need to adjust the error message in
- * check_effective_io_concurrency.)
+ * might in future be support for alternative low-level prefetch APIs,
+ * as well as platform-specific APIs defined elsewhere.)
*/
-#ifdef USE_POSIX_FADVISE
+#if defined(USE_POSIX_FADVISE)
#define USE_PREFETCH
#endif
diff --git a/src/include/port/darwin.h b/src/include/port/darwin.h
index 15fb69d6dbb..6aa2ea70f6b 100644
--- a/src/include/port/darwin.h
+++ b/src/include/port/darwin.h
@@ -6,3 +6,8 @@
#define HAVE_FSYNC_WRITETHROUGH
#endif
+
+/*
+ * macOS has a platform-specific implementation of prefetching.
+ */
+#define USE_PREFETCH
base-commit: e3ec9dc1bf4983fcedb6f43c71ea12ee26aefc7a
--
2.46.0