Hi, IRIX gave the world O_DIRECT, and then every Unix I've used followed their lead except Apple's, which gave the world fcntl(fd, F_NOCACHE, 1). From what I could find in public discussion, this API difference may stem from the caching policy being controlled at the per-file (vnode) level in older macOS (and perhaps ancestors), but since 10.4 it's per file descriptor, so approximately like O_DIRECT on other systems. The precise effects and constraints of O_DIRECT/F_NOCACHE are different across operating systems and file systems in some subtle and not-so-subtle ways, but the general concept is the same: try to avoid buffering.
I thought about a few different ways to encapsulate this API difference in PostgreSQL, and toyed with two: 1. We could define our own fake O_DIRECT flag, and translate that to the right thing inside BasicOpenFilePerm(). That seems a bit icky. We'd have to be careful not to collide with system defined flags and worry about changes. We do that sort of thing for Windows, though that's a bit different, there we translate *all* the flags from POSIXesque to Windowsian. 2. We could make an extended BasicOpenFilePerm() variant that takes a separate boolean parameter for direct, so that we don't have to hijack any flag space, but now we need new interfaces just to tolerate a rather niche system. Here's a draft patch like #2, just for discussion. Better ideas? The reason I want to get direct I/O working on this "client" OS is because the AIO project will propose to use direct I/O for the buffer pool as an option, and I would like Macs to be able to do that primarily for the sake of developers trying out the patch set. Based on memories from the good old days of attending conferences, a decent percentage of PostgreSQL developers are on Macs. As it stands, the patch only actually has any effect if you set wal_level=minimal and max_wal_senders=0, which is a configuration that I guess almost no-one uses. Otherwise xlog.c assumes that the filesystem is going to be used for data exchange with replication processes (something we should replace with WAL buffers in shmem some time soon) so for now it's better to keep the data in page cache since it'll be accessed again soon. Unfortunately, this change makes pg_test_fsync show a very slightly lower number for open_data_sync on my ancient Intel Mac, but pg_test_fsync isn't really representative anymore since minimal logging is by now unusual (I guess pg_test_fsync would ideally do the test with and without direct to make that clearer). Whether this is a good option for the WAL is separate from whether it's a good option for relation data (ie a way to avoid large scale double buffering, but have new, different problems), and later patches will propose new separate GUCs to control that.
0001-Support-direct-I-O-on-macOS.patch
Description: Binary data