Over in the thread at [1] we were wondering how come buildfarm member hyrax has suddenly started to fail like this:
diff -U3 /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/expected/errors.out /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/errors.out --- /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/expected/errors.out 2018-11-21 13:48:48.340000000 -0500 +++ /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/errors.out 2020-04-04 04:48:16.704699045 -0400 @@ -446,4 +446,4 @@ 'select infinite_recurse()' language sql; \set VERBOSITY terse select infinite_recurse(); -ERROR: stack depth limit exceeded +ERROR: stack depth limit exceeded at character 8 I've now looked into this, and found that it's not at all hard to duplicate; compile HEAD with -DCLOBBER_CACHE_ALWAYS, and run "select infinite_recurse()", and you'll likely get the changed error message. (The lack of other buildfarm failures is probably just because we have so few animals doing CLOBBER_CACHE_ALWAYS builds frequently.) The issue seems indeed to have been triggered by 8f59f6b9c0, because that inserted an equal() call into recomputeNamespacePath(). equal() includes a check_stack_depth() call. We get the observed message if this call is the one where the stack limit is hit, and it is invoked inside ParseFuncOrColumn(), which has set up a parser error callback to point at the infinite_recurse() call that it's trying to resolve. That callback's been there a long time of course, so we may conclude that no other code path reached from ParseFuncOrColumn contains a stack depth check, or we'd likely have seen this before. It's a bit surprising perhaps that we run out of stack here and not somewhere else; but ParseFuncOrColumn and its subroutines consume quite a lot of stack, because of FUNC_MAX_ARGS-sized local arrays, so it's not *that* surprising. So, what to do to re-stabilize the regression tests? Even if we wanted to revert 8f59f6b9c0, that'd be kind of a band-aid fix, because there are lots of other possible ways that a parser error callback could be active at the point of the error. A few other possibilities are: 1. Change the test to do "\set VERBOSITY sqlstate" so that all that is printed is ERROR: 54001 ERRCODE_STATEMENT_TOO_COMPLEX is used in few enough places that this wouldn't be too much of a loss of specificity. (Or we could give stack overflow its very own ERRCODE.) 2. Hack pcb_error_callback so that it suppresses the error position report for ERRCODE_STATEMENT_TOO_COMPLEX, as it already does for ERRCODE_QUERY_CANCELED. That seems pretty unpleasant though. 3. Create a separate expected-file to match the variant output. This would be a maintenance problem, but we could ameliorate that by moving the test to its own regression script, which was something that'd already been proposed to get around the PPC64 Linux kernel signal-handling bug that's been causing intermittent failures on most of the PPC64 buildfarm animals [2]. On the whole I find #1 the least unpleasant, as well as the most likely to forestall future variants of this issue. It won't dodge the PPC64 problem, but I'm willing to keep living with that one for now. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/CA%2BTgmoaUOS5X64nKgFxNV7JHN4sRkNAJYW2gHz-LMb0Ej4xHig%40mail.gmail.com [2] https://www.postgresql.org/message-id/27924.1571068231%40sss.pgh.pa.us