On Sun, Nov 11, 2018 at 9:59 PM Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > There is one major problem with this patch
If there's only one, you're doing great! Although admittedly this seems like a big one... > 1. Go back to the current pressure-valve strategy: make the sending > side perform the fsync(), if it can't immediately write to the pipe. As you say, this will happen significantly more often with deduplication. That deduplication logic got added in response to a real need. Before that, you could cause an individual backend to start doing its own fsyncs() with something as simple as a bulk load. The queue would absorb most of them, but not all, and the performance ramifications where noticeable. > 2. Offload the BufferSync() work to bgwriter, so the checkpointer can > keep draining the pipe. Communication between checkpointer and > bgwriter can be fairly easily multiplexed with the pipe draining work. That sounds a little like you are proposing to go back to the way things were before 806a2aee3791244bf0f916729bfdb5489936e068 (and, belatedly, bf405ba8e460051e715d0a91442b579e590328ce) although I guess the division of labor wouldn't be quite the same. > 3. Multiplex the checkpointer's work: Use LWLockConditionalAcquire() > when locking buffers, and if that fails, try to drain the pipe, and > then fall back to a LWLockTimedAcquire(), drain pipe, repeat loop. I > can hear you groan already; that doesn't seem particularly elegant, > and there are portability problems implementing LWLockTimedAcquire(): > semtimedop() and sem_timedwait() are not available on all platforms > (eg macOS). Maybe pthread_timed_condwait() could help (!). You don't really need to invent LWLockTimedAcquire(). You could just keep retrying LWLockConditionalAcquire() in a delay loop. I agree that doesn't seem particularly elegant, though. I still feel like this whole pass-the-fds-to-the-checkpointer thing is a bit of a fool's errand, though. I mean, there's no guarantee that the first FD that gets passed to the checkpointer is the first one opened, or even the first one written, is there? It seems like if you wanted to make this work reliably, you'd need to do it the other way around: have the checkpointer (or some other background process) open all the FDs, and anybody else who wants to have one open get it from the checkpointer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company