On Wed, Jan 08, 2020 at 02:50:53PM +0530, Amit Khandekar wrote: > On Sun, 5 Jan 2020 at 00:21, Noah Misch <n...@leadboat.com> wrote: > > The buildfarm client can capture stack traces, but it currently doesn't do > > so > > for TAP test suites (search the client code for get_stack_trace). If > > someone > > feels like writing a fix for that, it would be a nice improvement. Perhaps, > > rather than having the client code know all the locations where core files > > might appear, failed runs should walk the test directory tree for core > > files? > > I think this might end up having the same code to walk the directory > spread out on multiple files. Instead, I think in the build script, in > get_stack_trace(), we can do an equivalent of "find <inputdir> -name > "*core*" , as against the current way in which it looks for core files > only in the specific data directory.
Agreed. > Noah, is it possible to run a patch'ed build script once I submit a > patch, so that we can quickly get the stack trace ? I mean, can we do > this before getting the patch committed ? I guess, we can run the > build script with a single branch specified, right ? Yes to all questions, but it would not have helped in this case. First, v10 deletes PostgresNode base directories at the end of this test file, despite the failure[1]. Second, the stack trace was minimal: (gdb) bt #0 0xd011119c in extend_brk () from /usr/lib/libc.a(shr.o) Even so, a web search for "extend_brk" led to the answer. By default, 32-bit AIX binaries get only 256M of RAM for stack and sbrk. The new regression test used more than that, hence this crash. Setting LDR_CNTRL=MAXDATA=0x80000000 in the environment cured the crash. I've put that in the buildfarm member configuration and started a new run. (PostgreSQL documentation actually covers this problem: https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX) [1] It has the all_tests_passing() logic in an attempt to stop this. I'm guessing it didn't help because the file failed by calling die "connection error: ...", not by reporting a failure to Test::More via ok(0) or similar.