On 9/26/19 5:55 AM, Fujii Masao wrote: > On Thu, Sep 26, 2019 at 5:22 AM David Steele <da...@pgmasters.net> wrote: >> >> While testing against PG12 I noticed the documentation states that >> recovery targets are not valid when standby.signal is present. > > Or that description in the doc is not true? Other recovery target > parameters seem to take effect even when standby.signal exists.
Yes, and this is true for or any combination of recovery.signal and standby signal as far as I can see. We have been tracking down some strange behaviors over the last few days as we have been adding PG12 support to pgBackRest. Late in the day I know, but we just got the relevant code migrated to C and we did not fancy coding it twice. The main thing is if you set recovery_target_time in postgresql.auto.conf then recovery will always try to hit that target with any combination of recovery.signal and standby.signal. But target_action is only active when recovery.signal, standby.signal, or both are present. All these tests were done on 12rc1. So given this postgresql.auto.conf: recovery_target_time = '2019-09-26 14:39:51.280711+00' restore_command = 'cp /home/vagrant/test/archive/%f "%p"' recovery_target_timeline = current recovery_target_action = promote And these settings added to postgresql.conf: wal_level = replica archive_mode = on archive_command = 'test ! -f /home/vagrant/test/archive/%f && cp %p /home/vagrant/test/archive/%f' And this backup_label: START WAL LOCATION: 0/2000028 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/2000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2019-09-26 14:39:49 UTC LABEL: pg_basebackup base backup START TIMELINE: 1 The backup we are recovering contains a table that exists at the target time but is dropped after that as an additional confirmation. In all the recovery scenarios below the table exists after recovery. Here's what recovery looks like with recovery.signal: 2019-09-26 14:49:52.758 UTC [25353] LOG: database system was interrupted; last known up at 2019-09-26 14:39:49 UTC 2019-09-26 14:49:52.824 UTC [25353] LOG: starting point-in-time recovery to 2019-09-26 14:39:51.280711+00 2019-09-26 14:49:52.836 UTC [25353] LOG: restored log file "000000010000000000000002" from archive 2019-09-26 14:49:52.885 UTC [25353] LOG: redo starts at 0/2000028 2019-09-26 14:49:52.894 UTC [25353] LOG: consistent recovery state reached at 0/2000100 2019-09-26 14:49:52.894 UTC [25352] LOG: database system is ready to accept read only connections 2019-09-26 14:49:52.905 UTC [25353] LOG: restored log file "000000010000000000000003" from archive 2019-09-26 14:49:52.940 UTC [25353] LOG: recovery stopping before commit of transaction 487, time 2019-09-26 14:39:54.981557+00 2019-09-26 14:49:52.940 UTC [25353] LOG: redo done at 0/30096A0 cp: cannot stat '/home/vagrant/test/archive/00000002.history': No such file or directory 2019-09-26 14:49:52.943 UTC [25353] LOG: selected new timeline ID: 2 2019-09-26 14:49:52.998 UTC [25353] LOG: archive recovery complete 2019-09-26 14:49:52.998 UTC [25353] LOG: database system is ready to accept connections This is completely normal and what you would expect. Now without recovery.signal from a fresh restore: 2019-09-26 14:52:29.491 UTC [25409] LOG: database system was interrupted; last known up at 2019-09-26 14:39:49 UTC 2019-09-26 14:52:29.574 UTC [25409] LOG: restored log file "000000010000000000000002" from archive 2019-09-26 14:52:29.622 UTC [25409] LOG: redo starts at 0/2000028 2019-09-26 14:52:29.631 UTC [25409] LOG: consistent recovery state reached at 0/2000100 2019-09-26 14:52:29.642 UTC [25409] LOG: restored log file "000000010000000000000003" from archive 2019-09-26 14:52:29.666 UTC [25409] LOG: recovery stopping before commit of transaction 487, time 2019-09-26 14:39:54.981557+00 2019-09-26 14:52:29.666 UTC [25409] LOG: redo done at 0/30096A0 2019-09-26 14:52:29.716 UTC [25408] LOG: database system is ready to accept connections Now there is no "starting point-in-time recovery" message but we are still stopping in the same place, "recovery stopping before commit of transaction 487". There is no promotion so now we are now logging on timeline 1 (so there are duplicate WAL errors as soon as archive_command runs). In PG < 12 you could do this by shutting down, removing recovery.conf and restarting, but it is now much easier to end up on the same timeline. Now with with standby.signal only from a fresh restore: 2019-09-26 14:59:36.889 UTC [25522] LOG: database system was interrupted; last known up at 2019-09-26 14:39:49 UTC 2019-09-26 14:59:36.983 UTC [25522] LOG: entering standby mode 2019-09-26 14:59:36.994 UTC [25522] LOG: restored log file "000000010000000000000002" from archive 2019-09-26 14:59:37.038 UTC [25522] LOG: redo starts at 0/2000028 2019-09-26 14:59:37.047 UTC [25522] LOG: consistent recovery state reached at 0/2000100 2019-09-26 14:59:37.047 UTC [25521] LOG: database system is ready to accept read only connections 2019-09-26 14:59:37.061 UTC [25522] LOG: restored log file "000000010000000000000003" from archive 2019-09-26 14:59:37.093 UTC [25522] LOG: recovery stopping before commit of transaction 487, time 2019-09-26 14:39:54.981557+00 2019-09-26 14:59:37.093 UTC [25522] LOG: redo done at 0/30096A0 cp: cannot stat '/home/vagrant/test/archive/00000002.history': No such file or directory 2019-09-26 14:59:37.097 UTC [25522] LOG: selected new timeline ID: 2 2019-09-26 14:59:37.270 UTC [25522] LOG: archive recovery complete cp: cannot stat '/home/vagrant/test/archive/00000001.history': No such file or directory 2019-09-26 14:59:37.338 UTC [25521] LOG: database system is ready to accept connections The cluster starts in standby mode, hits the recovery target, then promotes even though no recovery.signal is present. And finally with both recovery.signal and standby.signal you get the same as with standby.signal only. I was able to get the same results using an xid target: recovery_target_xid = 487 recovery_target_inclusive = false All of this is roughly analogous to use cases that were possible before, but there were fewer permutations then. You had no standby and no recovery target without recovery.conf so "recovery.signal" was always there, more or less. At the very least, according to the docs, none of the target options are supposed to be active unless recovery.signal is in place. Since outdated entries in postgresql.auto.conf can have effect even in the absence of recovery.signal, it seems pretty important to make sure that mechanism is working correctly - or that the caveat is clearly documented. I do think this issue needs to be addressed before GA. Fujii -- I just became aware of your email at [1] so I'll respond to that as well. -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/CAHGQGwEYYg_Ng%2B03FtZczacCpYgJ2Pn%3DB_wPtWF%2BFFLYDgpa1g%40mail.gmail.com