Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Tom Lane
Andrew Dunstan writes: > On 4/8/20 3:41 PM, Robert Haas wrote: >> I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} = >> $source_ts_prefix does, > You don't want to know > See for the > gory details. I don't want to know

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Andrew Dunstan
On 4/8/20 3:41 PM, Robert Haas wrote: > On Wed, Apr 8, 2020 at 1:59 PM Tom Lane wrote: >> I guess we could commit it and find out. I'm all for the simpler >> coding if it works. > I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} = > $source_ts_prefix does, You don't want to know .

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 1:59 PM Tom Lane wrote: > I guess we could commit it and find out. I'm all for the simpler > coding if it works. I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix does, but the remove/unlink condition was suggested by Amit Kapila on the basis

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Tom Lane
Andrew Dunstan writes: > OK, tricky, but here's what I did to get this working on fairywren. > First, on Msys2 there is a problem with name mangling. We've had to fix > this before by telling it to ignore certain argument prefixes. > Second, once that was fixed rmdir was failing on the tablespace.

Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Andrew Dunstan
On 4/7/20 9:42 AM, Andrew Dunstan wrote: > On Tue, Apr 7, 2020 at 12:37 AM Tom Lane wrote: >> Robert Haas writes: >>> Taking stock of the situation this morning, most of the buildfarm is >>> now green. There are three failures, on eelpout (6 hours ago), >>> fairywren (17 hours ago), and hyrax (3

Re: backup manifests and contemporaneous buildfarm failures

2020-04-07 Thread Andrew Dunstan
On Tue, Apr 7, 2020 at 12:37 AM Tom Lane wrote: > > Robert Haas writes: > > Taking stock of the situation this morning, most of the buildfarm is > > now green. There are three failures, on eelpout (6 hours ago), > > fairywren (17 hours ago), and hyrax (3 days, 7 hours ago). > > fairywren has now

Re: backup manifests and contemporaneous buildfarm failures

2020-04-07 Thread Andrew Dunstan
On Mon, Apr 6, 2020 at 1:18 AM Fabien COELHO wrote: > > > Hello, > > >> Do I need to precede those with some recursive chmod commands? Perhaps > >> the client should refuse to run if there is still something left after > >> these. > > > > I think the latter would be a very good idea, just so that

Re: backup manifests and contemporaneous buildfarm failures

2020-04-06 Thread Tom Lane
Robert Haas writes: > Taking stock of the situation this morning, most of the buildfarm is > now green. There are three failures, on eelpout (6 hours ago), > fairywren (17 hours ago), and hyrax (3 days, 7 hours ago). fairywren has now done this twice in the pg_validatebackupCheck step: exec fail

Re: backup manifests and contemporaneous buildfarm failures

2020-04-06 Thread Andrew Dunstan
On 4/6/20 7:53 AM, Robert Haas wrote: > On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan > wrote: >> Do I need to precede those with some recursive chmod commands? > +1. > >> Perhaps >> the client should refuse to run if there is still something left after >> these. > +1 to that, too. > See https

Re: backup manifests and contemporaneous buildfarm failures

2020-04-06 Thread Robert Haas
On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan wrote: > Do I need to precede those with some recursive chmod commands? +1. > Perhaps > the client should refuse to run if there is still something left after > these. +1 to that, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The E

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Fabien COELHO
Hello, Do I need to precede those with some recursive chmod commands? Perhaps the client should refuse to run if there is still something left after these. I think the latter would be a very good idea, just so that this sort of failure is less obscure. Not sure about whether a recursive chm

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Tom Lane
Andrew Dunstan writes: > Hmm, the buildfarm client does this at the beginning of each run to > remove anything that might be left over from a previous run: > rmtree("inst"); > rmtree("$pgsql") unless ($from_source && !$use_vpath); Right, the point is precisely that some versions of rmtre

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Andrew Dunstan
On 4/5/20 9:10 AM, Mikael Kjellström wrote: > On 2020-04-04 04:43, Robert Haas wrote: > >> I think I've done about as much as I can do for tonight, though. Most >> things are green now, and the ones that aren't are failing because of >> stuff that is at least plausibly fixed. By morning it should

Re: backup manifests and contemporaneous buildfarm failures

2020-04-05 Thread Mikael Kjellström
On 2020-04-04 04:43, Robert Haas wrote: I think I've done about as much as I can do for tonight, though. Most things are green now, and the ones that aren't are failing because of stuff that is at least plausibly fixed. By morning it should be clearer how much broken stuff is left, although that

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Thomas Munro
On Sun, Apr 5, 2020 at 2:36 AM Robert Haas wrote: > eelpout is unhappy because: > > +WARNING: could not remove shared memory segment > "/PostgreSQL.248989127": No such file or directory > +WARNING: could not remove shared memory segment > "/PostgreSQL.1450751626": No such file or directory Seem

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Tom Lane
Robert Haas writes: > On Sat, Apr 4, 2020 at 10:57 AM Tom Lane wrote: >> What is odd is that >> (AFAIR) we've never seen this before. Maybe somebody recently added >> an error cursor callback in a place that didn't have it before, and >> is involved in SQL-function processing? None of the commi

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Robert Haas
On Sat, Apr 4, 2020 at 10:57 AM Tom Lane wrote: > It's not so surprising that we could get a different result that way > from a CLOBBER_CACHE_ALWAYS animal like hyrax, since CCA-forced > cache reloads would cause extra stack expenditure at a lot of places. > And it could vary depending on totally

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Tom Lane
Robert Haas writes: > hyrax's last run was before any of this happened, so it seems to have > an unrelated problem. The last two runs, three and six days ago, both > failed like this: > -ERROR: stack depth limit exceeded > +ERROR: stack depth limit exceeded at character 8 > Not sure what that'

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Robert Haas
On Fri, Apr 3, 2020 at 10:43 PM Robert Haas wrote: > I think I've done about as much as I can do for tonight, though. Most > things are green now, and the ones that aren't are failing because of > stuff that is at least plausibly fixed. By morning it should be > clearer how much broken stuff is le

Re: backup manifests and contemporaneous buildfarm failures

2020-04-04 Thread Robert Haas
On Fri, Apr 3, 2020 at 11:06 PM Andres Freund wrote: > On 2020-04-03 20:48:09 -0400, Robert Haas wrote: > > 'serinus' is also failing. This is less obviously related: > > Hm. Tests passed once since then. Yeah, but conchuela also failed once in what I think was a similar way. I suspect the fix I

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Petr Jelinek
On 04/04/2020 05:06, Andres Freund wrote: Hi, Peter, Petr, CCed you because it's probably a bug somewhere around the initial copy code for logical replication. On 2020-04-03 20:48:09 -0400, Robert Haas wrote: 'serinus' is also failing. This is less obviously related: Hm. Tests passed once s

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Andres Freund
Hi, Peter, Petr, CCed you because it's probably a bug somewhere around the initial copy code for logical replication. On 2020-04-03 20:48:09 -0400, Robert Haas wrote: > 'serinus' is also failing. This is less obviously related: Hm. Tests passed once since then. > 2020-04-04 02:08:57.299 CEST

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 9:52 PM Tom Lane wrote: > Robert Haas writes: > > 'prairiedog' is also unhappy, and it looks related: > > Yeah, gaur also failed in the same place. Both of those are > alignment-picky 32-bit hardware, so I'm thinking the problem is > pg_gmtime() trying to fetch a 64-bit pg

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Robert Haas writes: > Interestingly, on my machine, rmtree coped with a mode 0 directory > just fine, but mode 0400 was more than its tiny brain could handle, so > the originally committed fix had code to revert 0400 back to 0700, but > I didn't add similar code to revert from 0 back to 0700 becau

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Robert Haas writes: > 'prairiedog' is also unhappy, and it looks related: Yeah, gaur also failed in the same place. Both of those are alignment-picky 32-bit hardware, so I'm thinking the problem is pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently aligned address. I'm trying

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 8:12 PM Tom Lane wrote: > Yeah, so it would seem. The buildfarm script uses rmtree to clean out > the old build tree. The man page for File::Path suggests (but can't > quite bring itself to say in so many words) that by default, rmtree > will adjust the permissions on targ

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 6:13 PM Tom Lane wrote: > Locally, I observe that "make clean" in src/bin/pg_validatebackup fails > to clean up the tmp_check directory left behind by "make check". Fixed. I also tried to fix 'lapwing', which was complaining about about a call to pg_gmtime, saying that it

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Robert Haas writes: > On Fri, Apr 3, 2020 at 6:48 PM Tom Lane wrote: >> I'm guessing that we're looking at a platform-specific difference in >> whether "rm -rf" fails outright on an unreadable subdirectory, or >> just tries to carry on by unlinking it anyway. > My intention was that it would be

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 5:58 PM Fabien COELHO wrote: > seawasp just failed the same way. Good news, I can see "configure" under > "HEAD/pgsql". Ah, good. > The only strange thing under buildroot I found is: > > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/back

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
On Fri, Apr 3, 2020 at 6:48 PM Tom Lane wrote: > I'm guessing that we're looking at a platform-specific difference in > whether "rm -rf" fails outright on an unreadable subdirectory, or > just tries to carry on by unlinking it anyway. My intention was that it would be cleaned by the TAP framework

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
I wrote: > I'm guessing that we're looking at a platform-specific difference in > whether "rm -rf" fails outright on an unreadable subdirectory, or > just tries to carry on by unlinking it anyway. Yeah... on my RHEL6 box, "make check" cleans up the working directories under tmp_check, but on a Fre

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Thomas Munro writes: > Same here, on elver. I see pg_subtrans has been chmod(0)'d, > presumably by the perl subroutine mutilate_open_directory_fails. I > see this in my inbox (the build farm wrote it to stderr or stdout > rather than the log file): > cannot chdir to child for > pgsql.build/src/

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Stephen Frost
Greetings, * Thomas Munro (thomas.mu...@gmail.com) wrote: > On Sat, Apr 4, 2020 at 11:13 AM Tom Lane wrote: > > Fabien COELHO writes: > > > The only strange thing under buildroot I found is: > > > > > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_d

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Thomas Munro
On Sat, Apr 4, 2020 at 11:13 AM Tom Lane wrote: > Fabien COELHO writes: > > The only strange thing under buildroot I found is: > > > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/ > > > this last directory perms are d

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Alvaro Herrera
On 2020-Apr-03, Tom Lane wrote: > I wonder if VPATH versus not-VPATH might be a relevant factor ... Oh, absolutely. The ones that failed show, in the last successful run, the configure line invoked as "./configure", while the animals that are still running are invoking configure from some other

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Tom Lane
Fabien COELHO writes: > The only strange thing under buildroot I found is: > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/ > this last directory perms are d- which seems to break cleanup. Locally, I observe tha

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Fabien COELHO
Hello Robert, Done now. Meanwhile, two more machines have reported the mysterious message: sh: ./configure: not found ...that first appeared on spurfowl a few hours ago. The other two machines are eelpout and elver, both of which list Thomas Munro as a maintainer. spurfowl lists Stephen Fros

backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Robert Haas
[ splitting this off into a separate thread ] On Fri, Apr 3, 2020 at 5:07 PM Robert Haas wrote: > I'lll go see about adding that. Done now. Meanwhile, two more machines have reported the mysterious message: sh: ./configure: not found ...that first appeared on spurfowl a few hours ago. The othe