Re: Standby trying "restore_command" before local WAL

Stephen Frost Tue, 31 Jul 2018 11:26:28 -0700

Greetings,

* Sergei Kornilov (s...@zsrv.org) wrote:
> > As mentioned by others, it sounds like we could have an option to try
> > contacting the primary before running restore_commnad
> Why about primary?
> If we have restore_command on slave (or during point in time recovery) - we 
> force using XLOG_FROM_ARCHIVE, even if XLOG_FROM_PG_WAL source can provide 
> next WAL. As say xlog.c comment [1]:


Right..

> > * We just successfully read a file in pg_wal. We prefer files in
> > * the archive over ones in pg_wal, so try the next file again
> > * from the archive first.
> 
> We have some actual reason why we prefer restore_command instead of using 
> local wal files first?

Yes, as discussed in the comments mentioned up-thread.

> Partially written WAL? Streaming replication can leave partially written WAL 
> and we can handle this correctly.

Sure, though even in that case there seems to be a reasonable use-case
here for an option to control if restore_command is used to get the next
needed WAL or if the primary should be asked for the WAL first.

There's still a question here, at least from my perspective, as to which
is actually going to be faster to perform recovery based off of.  A good
restore command, which pre-fetches the WAL in parallel and gets it local
and on the same filesystem, meaning that the restore_command only has to
execute essentially a 'mv' and return back to PG for the next WAL file,
is really rather fast, compared to streaming that same data over the
network with a single TCP connection to the primary.  Of course, there's
a lot of variables there and it depends on the network speed between the
various pieces, but I've certainly had cases where a replica catches up
much faster using restore command than streaming from the primary.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Standby trying "restore_command" before local WAL

Reply via email to