Re: Standby trying "restore_command" before local WAL

2018-08-08 Thread Stephen Frost
Greetings, * David Steele (da...@pgmasters.net) wrote: > I can see cases where it *might* be worth it, but several backup tools > support prefetch and/or parallelism which should be able to keep > Postgres fed with WAL unless there is very high latency to the repo. > I'm not sure the small perform

Re: Standby trying "restore_command" before local WAL

2018-08-08 Thread David Steele
On 8/8/18 11:45 AM, Stephen Frost wrote: > > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: >> On 08/08/2018 04:08 PM, David Steele wrote: >>> On 8/7/18 12:05 PM, Stephen Frost wrote: > All I'm saying is that (assuming my understanding of RestoreArchivedFile > is > correct) we c

Re: Standby trying "restore_command" before local WAL

2018-08-08 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 08/08/2018 04:08 PM, David Steele wrote: > >On 8/7/18 12:05 PM, Stephen Frost wrote: > >>>All I'm saying is that (assuming my understanding of RestoreArchivedFile is > >>>correct) we can't just do that in the current restore_comm

Re: Standby trying "restore_command" before local WAL

2018-08-08 Thread Tomas Vondra
On 08/08/2018 04:08 PM, David Steele wrote: On 8/7/18 12:05 PM, Stephen Frost wrote: All I'm saying is that (assuming my understanding of RestoreArchivedFile is correct) we can't just do that in the current restore_command. We do need a way to ask the archive for some metadata/checksums, and

Re: Standby trying "restore_command" before local WAL

2018-08-08 Thread David Steele
On 8/7/18 12:05 PM, Stephen Frost wrote: >> >> All I'm saying is that (assuming my understanding of RestoreArchivedFile is >> correct) we can't just do that in the current restore_command. We do need a >> way to ask the archive for some metadata/checksums, and restore_command is >> too late. > > Y

Re: Standby trying "restore_command" before local WAL

2018-08-08 Thread David Steele
On 8/7/18 11:42 AM, Stephen Frost wrote: > >>> CRC's are per WAL record, and possibly some WAL records might not be ok >>> to replay, or at least we need to make sure that we replay the right set >>> of WAL in the right order even when there are partial WAL files being >>> given to PG (that aren't

Re: Standby trying "restore_command" before local WAL

2018-08-07 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > That's how I read this part of RestoreArchivedFile: > > https://github.com/postgres/postgres/blob/master/src/backend/access/transam/xlogarchive.c#L110 > > The very first thing it does is checking if the local file exists, and if i

Re: Standby trying "restore_command" before local WAL

2018-08-07 Thread Tomas Vondra
On 08/07/2018 05:42 PM, Stephen Frost wrote: Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On 08/06/2018 09:32 PM, Stephen Frost wrote: * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On 08/06/2018 06:11 PM, Stephen Frost wrote: WAL checksums are per WAL record, not

Re: Standby trying "restore_command" before local WAL

2018-08-07 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 08/06/2018 09:32 PM, Stephen Frost wrote: > >* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > >>On 08/06/2018 06:11 PM, Stephen Frost wrote: > >WAL checksums are per WAL record, not across the whole file... And, > >yes, th

Re: Standby trying "restore_command" before local WAL

2018-08-07 Thread Tomas Vondra
On 08/06/2018 09:32 PM, Stephen Frost wrote: Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On 08/06/2018 06:11 PM, Stephen Frost wrote: * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On 08/06/2018 05:19 PM, Stephen Frost wrote: * David Steele (da...@pgmasters.net)

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 08/06/2018 06:11 PM, Stephen Frost wrote: > >* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > >>On 08/06/2018 05:19 PM, Stephen Frost wrote: > >>>* David Steele (da...@pgmasters.net) wrote: > I think for the stated scen

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Tomas Vondra
On 08/06/2018 06:11 PM, Stephen Frost wrote: Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On 08/06/2018 05:19 PM, Stephen Frost wrote: * David Steele (da...@pgmasters.net) wrote: I think for the stated scenario (known good standby that has been shutdown gracefully) it ma

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Stephen Frost
Greetings, * Jaime Casanova (jaime.casan...@2ndquadrant.com) wrote: > On Mon, 6 Aug 2018 at 11:01, Stephen Frost wrote: > > > What about the following cases? > > > 1. replica host crashed, and in pg_wal we have a few thousands WAL files. > > > > If this is the case then the replica was very far b

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Jaime Casanova
On Mon, 6 Aug 2018 at 11:01, Stephen Frost wrote: > > > What about the following cases? > > 1. replica host crashed, and in pg_wal we have a few thousands WAL files. > > If this is the case then the replica was very far behind on replay, > presumably, and in some of those cases rebuilding the repl

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 08/06/2018 05:19 PM, Stephen Frost wrote: > >* David Steele (da...@pgmasters.net) wrote: > >>I think for the stated scenario (known good standby that has been > >>shutdown gracefully) it makes perfect sense to trust the contents

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Stephen Frost
Greetings, * Alexander Kukushkin (cyberd...@gmail.com) wrote: > 2018-07-31 20:25 GMT+02:00 Stephen Frost : > > There's still a question here, at least from my perspective, as to which > > is actually going to be faster to perform recovery based off of. A good > > restore command, which pre-fetche

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Tomas Vondra
On 08/06/2018 05:19 PM, Stephen Frost wrote: Greetings, * David Steele (da...@pgmasters.net) wrote: I think for the stated scenario (known good standby that has been shutdown gracefully) it makes perfect sense to trust the contents of pg_wal. Call this scenario #1. An alternate scenario (#2)

Re: Standby trying "restore_command" before local WAL

2018-08-06 Thread Stephen Frost
Greetings, * David Steele (da...@pgmasters.net) wrote: > I think for the stated scenario (known good standby that has been > shutdown gracefully) it makes perfect sense to trust the contents of > pg_wal. Call this scenario #1. > > An alternate scenario (#2) is that the data directory was copied

Re: Standby trying "restore_command" before local WAL

2018-08-03 Thread Michael Paquier
On Tue, Jul 31, 2018 at 02:55:58PM +0200, Emre Hasegeli wrote: > == The Workarounds == > > We can possibly work around this inside the "restore_command" or > by delaying the archiving. Working around inside the "restore_command" > would involve checking whether the file exists under pg_wal/. Thi

Re: Standby trying "restore_command" before local WAL

2018-08-03 Thread David Steele
On 8/2/18 4:08 PM, Robert Haas wrote: > On Wed, Aug 1, 2018 at 7:14 AM, Emre Hasegeli wrote: >>> There's still a question here, at least from my perspective, as to which >>> is actually going to be faster to perform recovery based off of. A good >>> restore command, which pre-fetches the WAL in p

Re: Standby trying "restore_command" before local WAL

2018-08-03 Thread Simon Riggs
On 2 August 2018 at 21:08, Robert Haas wrote: > On Wed, Aug 1, 2018 at 7:14 AM, Emre Hasegeli wrote: >>> There's still a question here, at least from my perspective, as to which >>> is actually going to be faster to perform recovery based off of. A good >>> restore command, which pre-fetches the

Re: Standby trying "restore_command" before local WAL

2018-08-03 Thread Alexander Kukushkin
Hi, 2018-07-31 20:25 GMT+02:00 Stephen Frost : > > > There's still a question here, at least from my perspective, as to which > is actually going to be faster to perform recovery based off of. A good > restore command, which pre-fetches the WAL in parallel and gets it local > and on the same file

Re: Standby trying "restore_command" before local WAL

2018-08-02 Thread Robert Haas
On Wed, Aug 1, 2018 at 7:14 AM, Emre Hasegeli wrote: >> There's still a question here, at least from my perspective, as to which >> is actually going to be faster to perform recovery based off of. A good >> restore command, which pre-fetches the WAL in parallel and gets it local >> and on the sam

Re: Standby trying "restore_command" before local WAL

2018-08-01 Thread Emre Hasegeli
> There's still a question here, at least from my perspective, as to which > is actually going to be faster to perform recovery based off of. A good > restore command, which pre-fetches the WAL in parallel and gets it local > and on the same filesystem, meaning that the restore_command only has to

Re: Standby trying "restore_command" before local WAL

2018-07-31 Thread Stephen Frost
Greetings, * Sergei Kornilov (s...@zsrv.org) wrote: > > As mentioned by others, it sounds like we could have an option to try > > contacting the primary before running restore_commnad > Why about primary? > If we have restore_command on slave (or during point in time recovery) - we > force using

Re: Standby trying "restore_command" before local WAL

2018-07-31 Thread Sergei Kornilov
Hello > As mentioned by others, it sounds like we could have an option to try > contacting the primary before running restore_commnad Why about primary? If we have restore_command on slave (or during point in time recovery) - we force using XLOG_FROM_ARCHIVE, even if XLOG_FROM_PG_WAL source can p

Re: Standby trying "restore_command" before local WAL

2018-07-31 Thread Stephen Frost
Greetings, * Emre Hasegeli (e...@hasegeli.com) wrote: > This issue came to our attention after we migrated an application from > an object storage backend, and noticed that restarting a standby node > takes hours or sometimes days. > > We are using shared WAL archive and find it practical to have

Re: Standby trying "restore_command" before local WAL

2018-07-31 Thread Simon Riggs
On 31 July 2018 at 13:55, Emre Hasegeli wrote: > Currently the startup process tries the "restore_command" before > the WAL files locally available under pg_wal/ [1]. I believe we should > change this behavior. > If there will be a consensus on fixing this, I can try to prepare > a patch. The c