On 14.08.24 16:39, Peter Eisentraut wrote:
On 14.08.24 14:36, Thomas Munro wrote:
On Wed, Aug 14, 2024 at 7:04 PM Peter Eisentraut <pe...@eisentraut.org> wrote:
Attached is a patch to implement this.  It seems to work, but of course
it's kind of hard to tell whether it actually does anything useful.

Header order problem: pg_config_os.h defines __darwin__, but
pg_config_manual.h is included first, and tests __darwin__.  I hacked
my way around that, and then made a table of 40,000,000 integers in a
2GB buffer pool.  I used "select count(pg_buffercache_evict(buffered))
from pg_buffer_cache", and "sudo purge", to clear the two layers of
cache for each test, and then measured:

maintenance_io_concurrency=0,  ANALYZE: 2311ms
maintenance_io_concurrency=10, ANALYZE:  652ms
maintenance_io_concurrency=25, ANALYZE:  389ms

It works!

Cool!  I'll work on a more polished patch.

Here it is.

Some interesting questions:

What to do about the order of the symbols and include files. I threw something into src/include/port/darwin.h, but I'm not sure if that's good. Alternatively, we could not use __darwin__ but instead the more standard and predefined defined(__APPLE__) && defined(__MACH__).

How to document it. The current documentation makes references mainly to the availability of posix_fadvise(). That seems quite low-level. How could a user of a prepared package even find out about that? Should we just say "requires OS support" (kind of like I did here) and you can query the effective state by looking at the *_io_concurrency settings? Or do we need a read-only parameter that shows whether prefetch support exists (kind of along the lines of huge pages)?

Btw., for context, here is what I gather the prefetch support (with this patch) is:

cygwin          posix_fadvise
darwin          fcntl
freebsd         posix_fadvise
linux           posix_fadvise
netbsd          posix_fadvise
openbsd         no
solaris         fake
win32           no

(There is also the possibility that we provide an implementation of posix_fadvise() for macOS that wraps the platform-specific code in this patch. And then Apple could just take that. ;-) )
From d318c4108b9e05a4828f6e7f71af34c3ca89b3ed Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Fri, 16 Aug 2024 20:15:43 +0200
Subject: [PATCH v2] Add prefetching support on macOS

macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE
command does the same thing.

Discussion: 
https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org
---
 doc/src/sgml/config.sgml        | 14 +++-----
 doc/src/sgml/wal.sgml           |  4 +--
 src/backend/commands/variable.c |  4 +--
 src/backend/storage/file/fd.c   | 59 ++++++++++++++++++++++-----------
 src/include/pg_config_manual.h  |  7 ++--
 src/include/port/darwin.h       |  5 +++
 6 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2937384b001..c6d2fa2148e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2679,11 +2679,10 @@ <title>Asynchronous Behavior</title>
         </para>
 
         <para>
-         Asynchronous I/O depends on an effective 
<function>posix_fadvise</function>
-         function, which some operating systems lack.  If the function is not
-         present then setting this parameter to anything but zero will result
-         in an error.  On some operating systems (e.g., Solaris), the function
-         is present but does not actually do anything.
+         Asynchronous I/O depends on an effective support by the operating
+         system, which some operating systems lack.  If there is no operating
+         system support then setting this parameter to anything but zero will
+         result in an error.
         </para>
 
         <para>
@@ -3852,10 +3851,7 @@ <title>Recovery</title>
         <literal>off</literal>, <literal>on</literal> and
         <literal>try</literal> (the default).  The setting
         <literal>try</literal> enables
-        prefetching only if the operating system provides the
-        <function>posix_fadvise</function> function, which is currently used
-        to implement prefetching.  Note that some operating systems provide the
-        function, but it doesn't do anything.
+        prefetching only if the operating system provides prefetching support.
        </para>
        <para>
         Prefetching blocks that will soon be needed can reduce I/O wait times
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d5df65bc693..72b73dbf113 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -841,8 +841,8 @@ <title><acronym>WAL</acronym> Configuration</title>
    The <xref linkend="guc-maintenance-io-concurrency"/> and
    <xref linkend="guc-wal-decode-buffer-size"/> settings limit prefetching
    concurrency and distance, respectively.  By default, it is set to
-   <literal>try</literal>, which enables the feature on systems where
-   <function>posix_fadvise</function> is available.
+   <literal>try</literal>, which enables the feature on systems that have
+   prefetching support.
   </para>
  </sect1>
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 6202c5ebe44..c1c6c2811c9 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1212,7 +1212,7 @@ check_effective_io_concurrency(int *newval, void **extra, 
GucSource source)
 #ifndef USE_PREFETCH
        if (*newval != 0)
        {
-               GUC_check_errdetail("\"effective_io_concurrency\" must be set 
to 0 on platforms that lack posix_fadvise().");
+               GUC_check_errdetail("\"effective_io_concurrency\" must be set 
to 0 on platforms that lack prefetching support.");
                return false;
        }
 #endif                                                 /* USE_PREFETCH */
@@ -1225,7 +1225,7 @@ check_maintenance_io_concurrency(int *newval, void 
**extra, GucSource source)
 #ifndef USE_PREFETCH
        if (*newval != 0)
        {
-               GUC_check_errdetail("\"maintenance_io_concurrency\" must be set 
to 0 on platforms that lack posix_fadvise().");
+               GUC_check_errdetail("\"maintenance_io_concurrency\" must be set 
to 0 on platforms that lack prefetching support.");
                return false;
        }
 #endif                                                 /* USE_PREFETCH */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 3944321ff37..2830b310e0b 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2068,40 +2068,61 @@ FileClose(File file)
 /*
  * FilePrefetch - initiate asynchronous read of a given range of the file.
  *
- * Currently the only implementation of this function is using posix_fadvise
- * which is the simplest standardized interface that accomplishes this.
- * We could add an implementation using libaio in the future; but note that
- * this API is inappropriate for libaio, which wants to have a buffer provided
- * to read into.
+ * Returns 0 on success, otherwise an errno error code (like posix_fadvise()).
+ *
+ * posix_fadvise() is the simplest standardized interface that accomplishes
+ * this.  We could add an implementation using libaio in the future; but note
+ * that this API is inappropriate for libaio, which wants to have a buffer
+ * provided to read into.
  */
 int
 FilePrefetch(File file, off_t offset, off_t amount, uint32 wait_event_info)
 {
-#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
-       int                     returnCode;
-
        Assert(FileIsValid(file));
 
        DO_DB(elog(LOG, "FilePrefetch: %d (%s) " INT64_FORMAT " " INT64_FORMAT,
                           file, VfdCache[file].fileName,
                           (int64) offset, (int64) amount));
 
-       returnCode = FileAccess(file);
-       if (returnCode < 0)
-               return returnCode;
+#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
+       {
+               int                     returnCode;
+
+               returnCode = FileAccess(file);
+               if (returnCode < 0)
+                       return returnCode;
 
 retry:
-       pgstat_report_wait_start(wait_event_info);
-       returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
-                                                          POSIX_FADV_WILLNEED);
-       pgstat_report_wait_end();
+               pgstat_report_wait_start(wait_event_info);
+               returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
+                                                                  
POSIX_FADV_WILLNEED);
+               pgstat_report_wait_end();
 
-       if (returnCode == EINTR)
-               goto retry;
+               if (returnCode == EINTR)
+                       goto retry;
 
-       return returnCode;
+               return returnCode;
+       }
+#elif defined(__darwin__)
+       {
+               struct radvisory
+               {
+                       off_t           ra_offset;      /* offset into the file 
*/
+                       int                     ra_count;       /* size of the 
read     */
+               }                       ra;
+               int                     returnCode;
+
+               ra.ra_offset = offset;
+               ra.ra_count = amount;
+               pgstat_report_wait_start(wait_event_info);
+               returnCode = fcntl(VfdCache[file].fd, F_RDADVISE, &ra);
+               pgstat_report_wait_end();
+               if (returnCode != -1)
+                       return 0;
+               else
+                       return errno;
+       }
 #else
-       Assert(FileIsValid(file));
        return 0;
 #endif
 }
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index e799c2989b8..d603b87afd3 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -139,11 +139,10 @@
 /*
  * USE_PREFETCH code should be compiled only if we have a way to implement
  * prefetching.  (This is decoupled from USE_POSIX_FADVISE because there
- * might in future be support for alternative low-level prefetch APIs.
- * If you change this, you probably need to adjust the error message in
- * check_effective_io_concurrency.)
+ * might in future be support for alternative low-level prefetch APIs,
+ * as well as platform-specific APIs defined elsewhere.)
  */
-#ifdef USE_POSIX_FADVISE
+#if defined(USE_POSIX_FADVISE)
 #define USE_PREFETCH
 #endif
 
diff --git a/src/include/port/darwin.h b/src/include/port/darwin.h
index 15fb69d6dbb..6aa2ea70f6b 100644
--- a/src/include/port/darwin.h
+++ b/src/include/port/darwin.h
@@ -6,3 +6,8 @@
 #define HAVE_FSYNC_WRITETHROUGH
 
 #endif
+
+/*
+ * macOS has a platform-specific implementation of prefetching.
+ */
+#define USE_PREFETCH

base-commit: e3ec9dc1bf4983fcedb6f43c71ea12ee26aefc7a
-- 
2.46.0

Reply via email to