Hi, On 2017-06-15 15:06:43 -0700, Jeff Janes wrote: > > Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and > > this afaics would allow wal writer to go into sleep mode with half a > > page filled, and it'd not be woken up again until the page is filled. > > No? > > > > If it is taking the big sleep, then we always wake it up, since > acd4c7d58baf09f. > > I didn't change that part. I only changed what we do if it not hibernating.
Right, I hadn't read enough of the context. > It looks like only limited consolidation was happening, the number of kills > invoked was more than half of the number of SetLatch. I think the reason > is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately > suspends the calling process and gives the signalled process a chance to > run in that time slice. The Wal Writer gets woken up, sees that it only > has a partial page to write and decides not to do anything, but resets the > latch. It does this fast enough that the subscription worker hasn't > migrated CPUs yet. I think part of the problem here is that latches signal the owning process even if the owning process isn't actually sleeping - that's obviously quite pointless. In cases where walwriter is busy, that actually causes a lot of superflous interrupted syscalls, because it actually never ends up waiting on the latch. There's also a lot of superflous context switches. I think we can avoid doing so quite easily, as e.g. with the attached patch. Could you check how much that helps your benchmark? > The first change made the WALWriter wake up and do a write and flush > whenever an async commit occurred and there was a completed WAL page to be > written. This way the hint bits could be set while the heap page was still > hot, because they couldn't get set until the WAL covering the hinted-at > transaction commit is flushed. > > The second change said we can set hint bits before the WAL covering the > hinted-at transaction is flushed, as long the page LSN is newer than that > (so the wal covering the hinted-at transaction commit must be flushed > before the page containing the hint bit can be written). > > Then the third change makes the wal writer only write the full pages most > of the times it is woken up, not flush them. It only flushes them once > every wal_writer_delay or wal_writer_flush_after, regardless of how often > it is woken up. > > It seems like the third change rendered the first one useless. I don't think so. Isn't the walwriter writing out the contents of the WAL is quite important because it makes room in wal_buffers for new WAL? I suspect we actually should wake up wal-writer *more* aggressively, to offload wal fsyncs from individual backends, even when they're not committing. Right now we fsync whenever a segment is finished - we really don't want to do that in backends that could do other useful work. I suspect it'd be a good idea to remove that logic from XLogWrite() and replace it with if (ProcGlobal->walwriterLatch) SetLatch(ProcGlobal->walwriterLatch); > Wouldn't it > better, and much simpler, just to have reverted the first change once the > second one was done? I think both can actually happen independently, no? It's pretty common for the page lsn to be *older* than the corresponding commit. In fact you'll always hit that case unless there's also concurrent writes also touching said page. > If that were done, would the third change still be > needed? (It did seem to add some other features as well, so I'm going to > assume we still want those...). It's still useful, yes. It avoids flushing the WAL too often (page-by-page when there's a lot of writes), which can eat up a lot of IOPS on fast storage. > But now the first change is even worse than useless, it is positively > harmful. The only thing to stop it from waking the WALWriter for every > async commit is this line: > > /* if we have already flushed that far, we're done */ > if (WriteRqstPtr <= LogwrtResult.Flush) > return; > > Since LogwrtResult.Flush doesn't advance anymore, this condition doesn't > becomes false due to us waking walwriter, it only becomes false once the > timeout expires (which it would have done anyway with no help from us), or > once wal_writer_flush_after is exceeded. So now every async commit is > waking the walwriter. Most of those wake up do nothing (there is not a > completely new patch to write), some write one completed page but don't > flush anything, and very few do a flush, and that one would have been done > anyway. s/completely new patch/completely new page/? In my opinion we actually *do* want to write (but not flush!) out such pages, so I'm not sure I agree with that logic. Have to think about this some more... Greetings, Andres Freund
>From 50a3e62b16b752837f5a388e3259fee2bca6c2ab Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Sat, 24 Jun 2017 16:46:11 -0700 Subject: [PATCH 1/3] Don't signal latch owners if owner not waiting on latch. Doing so can generate a lot of useless syscalls and context switches in cases where the owner of the latch is busy doing actual work. Discussion: https://postgr.es/m/CAMkU=1z6_k4me12vnirgv5qnu58+czqdnxd+pb5bzxnvovv...@mail.gmail.com --- src/backend/storage/ipc/latch.c | 62 ++++++++++++++++++++++++++++++++++------- src/include/storage/latch.h | 1 + 2 files changed, 53 insertions(+), 10 deletions(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 07b1364de8..c1a1a76d0a 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -221,6 +221,7 @@ InitLatch(volatile Latch *latch) { latch->is_set = false; latch->owner_pid = MyProcPid; + latch->owner_waiting = false; latch->is_shared = false; #ifndef WIN32 @@ -268,6 +269,7 @@ InitSharedLatch(volatile Latch *latch) latch->is_set = false; latch->owner_pid = 0; + latch->owner_waiting = false; latch->is_shared = true; } @@ -311,6 +313,7 @@ DisownLatch(volatile Latch *latch) Assert(latch->owner_pid == MyProcPid); latch->owner_pid = 0; + latch->owner_waiting = false; /* paranoia */ } /* @@ -466,7 +469,17 @@ SetLatch(volatile Latch *latch) sendSelfPipeByte(); } else - kill(owner_pid, SIGUSR1); + { + /* + * If the owner of the latch is currently waiting on it, send signal + * to wake up that process. The read-barrier is necessary so we see + * an up-to-date value, and it pairs with the memory barrier in the + * path setting owner_waiting in WaitEventSetWait. + */ + pg_read_barrier(); + if (latch->owner_waiting) + kill(owner_pid, SIGUSR1); + } #else /* @@ -954,6 +967,12 @@ WaitEventSetWait(WaitEventSet *set, long timeout, * immediately, avoid blocking again. We don't attempt to report any * other events that might also be satisfied. * + * To avoid other processes signalling us if we're not waiting, + * wasting context switches, we first check whether the latch is + * already set, and only enable signalling from other processes after + * that. To avoid a race, we've to recheck whether the latch is set + * after asking to be woken up. + * * If someone sets the latch between this and the * WaitEventSetWaitBlock() below, the setter will write a byte to the * pipe (or signal us and the signal handler will do that), and the @@ -976,17 +995,37 @@ WaitEventSetWait(WaitEventSet *set, long timeout, * ordering, so that we cannot miss seeing is_set if a notification * has already been queued. */ - if (set->latch && set->latch->is_set) + if (set->latch) { - occurred_events->fd = PGINVALID_SOCKET; - occurred_events->pos = set->latch_pos; - occurred_events->user_data = - set->events[set->latch_pos].user_data; - occurred_events->events = WL_LATCH_SET; - occurred_events++; - returned_events++; + bool latch_set = false; + + if (set->latch->is_set) + { + latch_set = true; + } + else + { + set->latch->owner_waiting = true; + pg_memory_barrier(); + if (set->latch->is_set) + { + latch_set = true; + } + } + + if (latch_set) + { + occurred_events->fd = PGINVALID_SOCKET; + occurred_events->pos = set->latch_pos; + occurred_events->user_data = + set->events[set->latch_pos].user_data; + occurred_events->events = WL_LATCH_SET; + occurred_events++; + returned_events++; + + break; + } - break; } /* @@ -1016,6 +1055,9 @@ WaitEventSetWait(WaitEventSet *set, long timeout, waiting = false; #endif + if (set->latch) + set->latch->owner_waiting = false; + pgstat_report_wait_end(); return returned_events; diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 73abfafec5..442344d914 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -112,6 +112,7 @@ typedef struct Latch sig_atomic_t is_set; bool is_shared; int owner_pid; + int owner_waiting; /* int, so all platforms can set atomically */ #ifdef WIN32 HANDLE event; #endif -- 2.13.1.392.g8d1b10321b.dirty
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers