Greetings, * Alexander Kukushkin (cyberd...@gmail.com) wrote: > 2018-07-31 20:25 GMT+02:00 Stephen Frost <sfr...@snowman.net>: > > There's still a question here, at least from my perspective, as to which > > is actually going to be faster to perform recovery based off of. A good > > restore command, which pre-fetches the WAL in parallel and gets it local > > and on the same filesystem, meaning that the restore_command only has to > > execute essentially a 'mv' and return back to PG for the next WAL file, > > is really rather fast, compared to streaming that same data over the > > network with a single TCP connection to the primary. Of course, there's > > a lot of variables there and it depends on the network speed between the > > various pieces, but I've certainly had cases where a replica catches up > > much faster using restore command than streaming from the primary. > > Sure, mv is incredibly fast, but not calling external script/binary at > all is still faster than calling it.
I don't believe I was disputing that, apologies if it came across that way. Certainly, reading files directly without going through restore command is going to be faster than having to call restore command. The point I was attempting to make is that using restore command might be (and in some cases, certainly is) faster than streaming from a primary. > What about the following cases? > 1. replica host crashed, and in pg_wal we have a few thousands WAL files. If this is the case then the replica was very far behind on replay, presumably, and in some of those cases rebuilding the replica might very well be faster than replaying all of that WAL. This case does sound like it should be alright though. > 2. we are creating a new replica with pg_basebackup -X stream, it > takes a long time and again leaves a few thousands WAL files. This is certainly typical and also should be a safe case and therefore seems like a good case where we'd want to be able to tell the system to use what's in pg_wal first- perhaps that could be an option in recovery.conf which pg_basebackup and other tools that are managing the pg_wal directory and ensuring that all the WAL there is valid would be able to write into the recovery.conf. > In both cases, if there is no restore_command in the recovery.conf, > postgres will happily read WAL files from pg_wal and only when there > is nothing left it will try to start streaming. > > But, if restore_command is defined, it will always call the > restore_command, for every single WAL file it wants to restore. > If the restore_command exits with non zero exit code, postgres is > happily restoring the file from pg_wal! > And, only if the file is not there or not valid, postgres is trying to > start streaming. Yeah, I have to agree that it's not great that we don't seem to be entirely consistent here, as Robert pointed out up-thread. > >From my point of view, there is no difference between having no > restore_command and relying only on streaming replication and having > the restore_comman which always fails. > Therefore I don't really understand why we stick to the > "restore_command => pg_wal => streaming" and why it is not possible to > change it to "pg_wal => restore_command => streaming" or maybe even > (pg_wal => streaming => restore_command). I don't think I disagreed anywhere about having the option. There's a good point to be made that if we can figure out what the right thing to do is then we should just do that instead of having an option for it. If there's any case where the pg_wal directory might have invalid WAL to be replayed over top of the current cluster, though, then we shouldn't just be using that WAL and instead should be asking the user to let us know if the WAL is ok to use. If we can know when the WAL is invalid and ignore using it in those cases, then we should just go ahead and do that, but I'm unconvinced that's actually the case in a situation such as what David Steele described in his scenario #2. > I am not sure about the last option, but in any case. before going to > some remote place, postgres should try to find (and try to replay) the > WAL file in the pg_wal. Only if we know that it's valid to be replayed over the current cluster. Thanks! Stephen
signature.asc
Description: PGP signature