Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-18 Thread Michael Paquier
On Tue, Apr 19, 2022 at 09:45:11AM +1200, Thomas Munro wrote: > Delayed response to the question on how I did that, because it was a 4 > day weekend down here and I got distracted by sunshine... Happy Easter. > I think that sort of thing actually worked when I tried it on a > beefier workstation,

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-18 Thread Thomas Munro
On Mon, Apr 18, 2022 at 11:49 AM Michael Paquier wrote: > On Sun, Apr 17, 2022 at 10:56:08AM -0400, Andrew Dunstan wrote: > > I don't really think it's Cluster.pm's business to deal with that. It > > takes an install path as given either explicitly or implicitly. > > > > It shouldn't be too hard t

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-18 Thread Andrew Dunstan
On 2022-04-17 Su 19:49, Michael Paquier wrote: > On Sun, Apr 17, 2022 at 10:56:08AM -0400, Andrew Dunstan wrote: >> I don't really think it's Cluster.pm's business to deal with that. It >> takes an install path as given either explicitly or implicitly. >> >> It shouldn't be too hard to get Makefi

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-17 Thread Michael Paquier
On Tue, Apr 12, 2022 at 12:49:48PM +0900, Michael Paquier wrote: > This comes from df86e52, where we want to recovery a history file that > would be created as RECOVERYHISTORY and make sure that the file gets > removed at the end of recovery. So $standby2 should choose a new > timeline different f

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-17 Thread Michael Paquier
On Sun, Apr 17, 2022 at 10:56:08AM -0400, Andrew Dunstan wrote: > I don't really think it's Cluster.pm's business to deal with that. It > takes an install path as given either explicitly or implicitly. > > It shouldn't be too hard to get Makefile.global to install valgrind > wrappers into the tmp_

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-17 Thread Andrew Dunstan
On 2022-04-17 Su 00:17, Michael Paquier wrote: >> Since it's quite painful to run TAP tests under valgrind, I found a >> place to stick a plain old sleep to repro these problems: > Actually, I am wondering how you are patching Cluster.pm to do that. I don't really think it's Cluster.pm's busin

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-16 Thread Michael Paquier
On Sun, Apr 17, 2022 at 08:56:33AM +1200, Thomas Munro wrote: > Under valgrind I got "Undefined subroutine &main::usleep called at > t/002_archiving.pl line 103" so I added "use Time::HiRes qw(usleep);", > and now I get past the first 4 tests with your patch, but then > promotion times out, not sur

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-16 Thread Thomas Munro
On Tue, Apr 12, 2022 at 3:49 PM Michael Paquier wrote: > All that stuff leads me to the attached. Thoughts? Under valgrind I got "Undefined subroutine &main::usleep called at t/002_archiving.pl line 103" so I added "use Time::HiRes qw(usleep);", and now I get past the first 4 tests with your pat

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-11 Thread Michael Paquier
On Mon, Apr 11, 2022 at 06:48:58PM +1200, Thomas Munro wrote: > 1. This test had some pre-existing bugs/races, which hadn't failed > before due to scheduling, even under Valgrind. The above changes > appear to fix those problems. To Michael for comment. Yeah, there are two problems here. From

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-11 Thread Michael Paquier
On Mon, Apr 11, 2022 at 06:48:58PM +1200, Thomas Munro wrote: > Sorry for the delay... I got a bit confused about the different things > going on in this thread but I hope I've got it now: > > 1. This test had some pre-existing bugs/races, which hadn't failed > before due to scheduling, even unde

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-10 Thread Thomas Munro
On Sat, Apr 9, 2022 at 12:59 PM Andres Freund wrote: > On 2022-04-08 17:55:51 -0400, Tom Lane wrote: > > I tried adjusting the patch so it does guarantee that (as attached), > > and in two out of two tries it got past the archive_cleanup_command > > failure but then hung up waiting for standby2 to

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-08 Thread Andres Freund
Hi, On 2022-04-08 17:55:51 -0400, Tom Lane wrote: > I tried adjusting the patch so it does guarantee that (as attached), > and in two out of two tries it got past the archive_cleanup_command > failure but then hung up waiting for standby2 to promote. Adding $node_standby->safe_psql('postgres', "

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-08 Thread Andres Freund
Hi, On 2022-04-08 17:55:51 -0400, Tom Lane wrote: > After seeing skink's results, I tried running that test under valgrind > here, and it fails just like that every time. skink's history allows > us to bound the failure introduction between 79b716cfb7 and > d7ab2a9a3c, which I think makes it just

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-08 Thread Tom Lane
Andres Freund writes: > On 2022-04-07 13:57:45 -0400, Tom Lane wrote: >> Yeah, with only one instance it could just be cosmic rays or something. >> However, assuming it is real, I guess I wonder why we don't say >> CHECKPOINT_FORCE in standby mode too. > I guess it might partially be that restart

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-08 Thread Tom Lane
Andres Freund writes: > On 2022-04-07 13:57:45 -0400, Tom Lane wrote: >> Yeah, with only one instance it could just be cosmic rays or something. Not cosmic rays: skink has shown the same symptom three times running. Looks like maybe the archive_cleanup_command itself is doing something it shouldn

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-07 Thread Andres Freund
Hi, On 2022-04-07 13:57:45 -0400, Tom Lane wrote: > Yeah, with only one instance it could just be cosmic rays or something. > However, assuming it is real, I guess I wonder why we don't say > CHECKPOINT_FORCE in standby mode too. I guess it might partially be that restartpoints require a checkpoi

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-07 Thread Tom Lane
Andres Freund writes: > On 2022-04-07 13:40:30 -0400, Tom Lane wrote: >> This test is sending a CHECKPOINT command to the standby and >> expecting it to run the archive_cleanup_command, but it looks >> like the standby did not actually run any checkpoint: >> ... >> I wondered if the recent pg_stat

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-07 Thread Andres Freund
Hi, On 2022-04-07 13:40:30 -0400, Tom Lane wrote: > Michael Paquier writes: > > Add TAP test for archive_cleanup_command and recovery_end_command > > grassquit just showed a non-reproducible failure in this test [1]: I was just staring at that as well. > # Postmaster PID for node "standby" is

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-07 Thread Tom Lane
Michael Paquier writes: > Add TAP test for archive_cleanup_command and recovery_end_command grassquit just showed a non-reproducible failure in this test [1]: # Postmaster PID for node "standby" is 291160 ok 1 - check content from archives not ok 2 - archive_cleanup_command executed on checkpoin