Re: checkpoints taking much longer than expected

2019-06-17 Thread Andres Freund
On 2019-06-16 12:25:58 -0400, Jeff Janes wrote: > Right, but true only because they were "checkpoint starting: immediate". > Otherwise the reported write time includes intentional sleeps added to > honor the checkpoint_completion_target. A bit confusing to report it that > way, I think. +1 It's

Re: checkpoints taking much longer than expected

2019-06-17 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera wrote: > On 2019-Jun-14, Peter J. Holzer wrote: > > > There was a discussion about ZFS' COW behaviour and PostgreSQL reusing > > WAL files not being a good combination about a year ago: > > > https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > On 2019-Jun-16, Stephen Frost wrote: > > > The issue being discussed here is writing out to the heap files during a > > checkpoint... > > We don't really know, as it was already established that the log line is > misattributing time

Re: checkpoints taking much longer than expected

2019-06-16 Thread Alvaro Herrera
On 2019-Jun-16, Stephen Frost wrote: > The issue being discussed here is writing out to the heap files during a > checkpoint... We don't really know, as it was already established that the log line is misattributing time spent ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ Po

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost wrote: > > Ok, so you want fewer checkpoints because you expect to failover to a > > replica rather than recover the primary on a failure. If you're doing > > synchronous replication, then th

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > On 2019-Jun-16, Stephen Frost wrote: > > Not likely to help with what you're experiencing anyway though... > > My gut feeling is that you're wrong, since (as I understand) the > symptoms are the same. The issue in the linked-to thre

Re: checkpoints taking much longer than expected

2019-06-16 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost wrote: > Ok, so you want fewer checkpoints because you expect to failover to a > replica rather than recover the primary on a failure. If you're doing > synchronous replication, then that certainly makes sense. If you > aren't, then you're deciding

Re: checkpoints taking much longer than expected

2019-06-16 Thread Alvaro Herrera
On 2019-Jun-16, Stephen Frost wrote: > Not likely to help with what you're experiencing anyway though... My gut feeling is that you're wrong, since (as I understand) the symptoms are the same. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Re

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera > wrote: > > Note that Joyent ended up proposing patches to fix their performance > > problem (and got them committed). Maybe it would be useful for Tiemen > > to try that code? (That commi

Re: checkpoints taking much longer than expected

2019-06-16 Thread Tiemen Ruiten
On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera wrote: > > Note that Joyent ended up proposing patches to fix their performance > problem (and got them committed). Maybe it would be useful for Tiemen > to try that code? (That commit cherry-picks cleanly on REL_11_STABLE.) > Interesting! The per

Re: checkpoints taking much longer than expected

2019-06-16 Thread Alvaro Herrera
On 2019-Jun-14, Peter J. Holzer wrote: > There was a discussion about ZFS' COW behaviour and PostgreSQL reusing > WAL files not being a good combination about a year ago: > https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w%40mail.gmail.com > > Maybe yo

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Jeff Janes (jeff.ja...@gmail.com) wrote: > On Sat, Jun 15, 2019 at 4:50 AM Tiemen Ruiten wrote: > > On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > >> The time information is all there and it tells you what it's doing and > >> how much had to be done... If you're unhappy with

Re: checkpoints taking much longer than expected

2019-06-16 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > > * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > > > checkpoint_timeout = 60min > > > > That seems like a pretty long timeout. > > My reasoning was that a longer recovery time to av

Re: checkpoints taking much longer than expected

2019-06-16 Thread Jeff Janes
On Sat, Jun 15, 2019 at 4:50 AM Tiemen Ruiten wrote: > > On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > >> >> The time information is all there and it tells you what it's doing and >> how much had to be done... If you're unhappy with how long it takes to >> write out gigabytes of data an

Re: checkpoints taking much longer than expected

2019-06-16 Thread Michael Loftis
On Fri, Jun 14, 2019 at 08:02 Tiemen Ruiten wrote: > Hello, > > I setup a new 3-node cluster with the following specifications: > > 2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores) > 128 GB RAM > 8x Crucial MX500 1TB SSD's > > FS is ZFS, the dataset with the PGDATA directory on it has th

Re: checkpoints taking much longer than expected

2019-06-15 Thread Peter Geoghegan
On Sat, Jun 15, 2019 at 1:50 AM Tiemen Ruiten wrote: > During normal operation I don't mind that it takes a long time, but when > performing maintenance I want to be able to gracefully bring down the master > without long delays to promote one of the standby's. Maybe an "immediate" mode shutdow

Re: checkpoints taking much longer than expected

2019-06-15 Thread Tiemen Ruiten
On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost wrote: > Greetings, > > * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > > checkpoint_timeout = 60min > > That seems like a pretty long timeout. > My reasoning was that a longer recovery time to avoid writes would be acceptable because there are two m

Re: checkpoints taking much longer than expected

2019-06-14 Thread Peter J. Holzer
On 2019-06-14 16:01:40 +0200, Tiemen Ruiten wrote: > FS is ZFS, the dataset with the PGDATA directory on it has the following > properties (only non-default listed): [...] > My problem is that checkpoints are taking a long time. Even when I run a few > manual checkpoints one after the other, they k

Re: checkpoints taking much longer than expected

2019-06-14 Thread Stephen Frost
Greetings, * Tiemen Ruiten (t.rui...@tech-lab.io) wrote: > checkpoint_timeout = 60min That seems like a pretty long timeout. > My problem is that checkpoints are taking a long time. Even when I run a > few manual checkpoints one after the other, they keep taking very long, up > to 10 minutes: Y

Re: checkpoints taking much longer than expected

2019-06-14 Thread Stephen Frost
Greetings, * Ravi Krishna (ravikris...@mail.com) wrote: > On 6/14/19 10:01 AM, Tiemen Ruiten wrote: > >LOG:  checkpoint starting: immediate force wait > > Does it mean that the DB is blocked until the completion of checkpoint. > Years ago > Informix use to have this issue until they fixed around

Re: checkpoints taking much longer than expected

2019-06-14 Thread Ravi Krishna
On 6/14/19 10:01 AM, Tiemen Ruiten wrote: LOG:  checkpoint starting: immediate force wait Does it mean that the DB is blocked until the completion of checkpoint. Years ago Informix use to have this issue until they fixed around 2006.