Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-07-11 Thread Heikki Linnakangas
On 11/07/18 04:16, Thomas Munro wrote: On Tue, Jul 10, 2018 at 11:39 PM, Heikki Linnakangas wrote: I don't have a FreeBSD machine at hand, so I didn't try fixing that patch. I updated the FreeBSD version to use the header test approach you showed, and pushed that too. FWIW the build farm has

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-07-10 Thread Thomas Munro
On Tue, Jul 10, 2018 at 11:39 PM, Heikki Linnakangas wrote: > The 'postmaster_possibly_dead' flag is not reset anywhere. So if a process > receives a spurious death signal, even though postmaster is still alive, > PostmasterIsAlive() will continue to use the slow path. +1 > postmaster_possibly_d

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-07-10 Thread Heikki Linnakangas
On 27/06/18 08:26, Thomas Munro wrote: On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro wrote: On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier wrote: Thomas, trying to understand here... Why this place for the signal initialization? Wouldn't InitPostmasterChild() be a more logical place as we

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-06-26 Thread Thomas Munro
On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro wrote: > On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier wrote: >> Thomas, trying to understand here... Why this place for the signal >> initialization? Wouldn't InitPostmasterChild() be a more logical place >> as we'd want to have this logic caugh

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-24 Thread Thomas Munro
On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier wrote: > I have been looking at the proposed set for Linux, and the numbers are > here. By replaying 1GB worth of WAL after a pgbench run with the data > folder on a tmpfs the recovery time goes from 33s to 28s, so that's a > nice gain. Thanks for

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-24 Thread Michael Paquier
On Sat, Apr 21, 2018 at 12:25:27PM +1200, Thomas Munro wrote: > Here's a new version, because FreeBSD's new interface changed slightly. I have been looking at the proposed set for Linux, and the numbers are here. By replaying 1GB worth of WAL after a pgbench run with the data folder on a tmpfs th

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-20 Thread Thomas Munro
Here's a new version, because FreeBSD's new interface changed slightly. -- Thomas Munro http://www.enterprisedb.com 0001-Use-signals-for-postmaster-death-on-Linux-v3.patch Description: Binary data 0002-Use-signals-for-postmaster-death-on-FreeBSD-v3.patch Description: Binary data

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-19 Thread Thomas Munro
On Thu, Apr 19, 2018 at 6:20 PM, Andres Freund wrote: > On April 18, 2018 8:05:50 PM PDT, Thomas Munro > wrote: >>By the way, these patches only use the death signal to make >>PostmasterIsAlive() fast, for use by busy loops like recovery. The >>postmaster pipe is still used for IO/timeout loops

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-18 Thread Andres Freund
On April 18, 2018 8:05:50 PM PDT, Thomas Munro wrote: >On Wed, Apr 18, 2018 at 5:04 PM, Thomas Munro > wrote: >> On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas > wrote: On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund > wrote: > That person said he'd work on adding an equival

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-18 Thread Thomas Munro
On Wed, Apr 18, 2018 at 5:04 PM, Thomas Munro wrote: > On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas wrote: >>> On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund >>> wrote: That person said he'd work on adding an equivalent of linux' prctl(PR_SET_PDEATHSIG) to FreeBSD. > > Here is

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-17 Thread Thomas Munro
On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas wrote: >> On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund >> wrote: >>> That person said he'd work on adding an equivalent of linux' >>> prctl(PR_SET_PDEATHSIG) to FreeBSD. Here is an implementation of Andres's idea for Linux, and also for pat

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-11 Thread Thomas Munro
On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas wrote: > On 10/04/18 04:36, Thomas Munro wrote: >> Just an idea, not tested: what about a reusable WaitEventSet with zero >> timeout? Using the kqueue patch, that'd call kevent() which'd return >> immediately and tell you if any postmaster deat

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-11 Thread Heikki Linnakangas
On 10/04/18 04:36, Thomas Munro wrote: On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund wrote: I coincidentally got pinged about our current approach causing performance problems on FreeBSD and started writing a patch. The problem there appears to be that constantly attaching events to the read

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund
On April 9, 2018 6:57:23 PM PDT, Alvaro Herrera wrote: >Andres Freund wrote: >> >> On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera > wrote: > >> >Would it work to use this second pipe, to which each child writes a >> >byte that postmaster never reads, and then rely on SIGPIPE when >> >postmaste

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Alvaro Herrera
Andres Freund wrote: > > On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera > wrote: > >Would it work to use this second pipe, to which each child writes a > >byte that postmaster never reads, and then rely on SIGPIPE when > >postmaster dies? Then we never need to do a syscall. > > I'm not follo

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund
On April 9, 2018 6:36:19 PM PDT, Thomas Munro wrote: >On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund >wrote: >> I coincidentally got pinged about our current approach causing >> performance problems on FreeBSD and started writing a patch. The >> problem there appears to be that constantly at

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Thomas Munro
On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund wrote: > I coincidentally got pinged about our current approach causing > performance problems on FreeBSD and started writing a patch. The > problem there appears to be that constantly attaching events to the read > pipe end, from multiple processes

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund
On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera wrote: >Andres Freund wrote: > >> Another approach, that's simpler to implement, is to simply have a >> second selfpipe, just for WL_POSTMASTER_DEATH. > >Would it work to use this second pipe, to which each child writes a >byte >that postmaster nev

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Alvaro Herrera
Andres Freund wrote: > Another approach, that's simpler to implement, is to simply have a > second selfpipe, just for WL_POSTMASTER_DEATH. Would it work to use this second pipe, to which each child writes a byte that postmaster never reads, and then rely on SIGPIPE when postmaster dies? Then we

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund
Hi, On 2018-04-05 12:20:38 -0700, Andres Freund wrote: > > While it's not POSIX, at least some platforms are capable of delivering > > a separate signal on parent process death. Perhaps using that where > > available would be enough of an answer. > > Yea, that'd work on linux. Which is probably

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Stephen Frost
Greetings, * Heikki Linnakangas (hlinn...@iki.fi) wrote: > On 06/04/18 19:39, Andres Freund wrote: > >On 2018-04-06 07:39:28 -0400, Stephen Frost wrote: > >>While I tend to agree that it'd be nice to just make it cheaper, that > >>doesn't seem like something that we'd be likely to back-patch and I

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Heikki Linnakangas
On 06/04/18 19:39, Andres Freund wrote: On 2018-04-06 07:39:28 -0400, Stephen Frost wrote: While I tend to agree that it'd be nice to just make it cheaper, that doesn't seem like something that we'd be likely to back-patch and I tend to share Heikki's feelings that this is a performance regressi

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Andres Freund
Hi, On 2018-04-06 07:39:28 -0400, Stephen Frost wrote: > While I tend to agree that it'd be nice to just make it cheaper, that > doesn't seem like something that we'd be likely to back-patch and I tend > to share Heikki's feelings that this is a performance regression we > should be considering fi

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Stephen Frost
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-04-05 14:39:27 -0400, Tom Lane wrote: > > Andres Freund writes: > > > ISTM the better approach would be to try to reduce the cost of > > > PostmasterIsAlive() on common platforms - it should be nearly free if > > > done right. > >

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Michael Paquier
On Thu, Apr 05, 2018 at 02:39:27PM -0400, Tom Lane wrote: > While it's not POSIX, at least some platforms are capable of delivering > a separate signal on parent process death. Perhaps using that where > available would be enough of an answer. Are you referring to prctl here? +1 on improving per

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Andres Freund
On 2018-04-05 14:39:27 -0400, Tom Lane wrote: > Andres Freund writes: > > ISTM the better approach would be to try to reduce the cost of > > PostmasterIsAlive() on common platforms - it should be nearly free if > > done right. > > +1 if it's doable. > > > One way to achieve that would e.g. to st

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Tom Lane
Andres Freund writes: > ISTM the better approach would be to try to reduce the cost of > PostmasterIsAlive() on common platforms - it should be nearly free if > done right. +1 if it's doable. > One way to achieve that would e.g. to stop ignoring SIGPIPE and instead > check for postmaster death i

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Andres Freund
Hi, On 2018-04-05 10:23:43 +0300, Heikki Linnakangas wrote: > Profiling that, without any patches applied, I noticed that a lot of time > was spent in read()s on the postmaster-death pipe, i.e. in > PostmasterIsAlive(). We call that between *every* WAL record. > That seems like an utter waste of

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 08:23, Heikki Linnakangas wrote: > That seems like an utter waste of time. I'm almost inclined to call that a > performance bug. As a straightforward fix, I'd suggest that we call > HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but > only e.g. every 3

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Alvaro Herrera
Heikki Linnakangas wrote: > That seems like an utter waste of time. I'm almost inclined to call that a > performance bug. As a straightforward fix, I'd suggest that we call > HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but > only e.g. every 32 records. That would make