Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS

Andrew Dunstan Wed, 19 May 2021 11:36:22 -0700


On 5/18/21 11:03 PM, Michael Paquier wrote:
>
>> 3. Once the subscriber1 postmaster has exited, the TAP
>> test will eventually time out, and then this happens:
>>
>> [.. logs ..]
>>
>> That is, because we failed to shut down subscriber1, the
>> test script neglects to shut down subscriber2, and now
>> things just sit indefinitely.  So that's a robustness
>> problem in the TAP infrastructure, rather than a bug in
>> PG proper; but I still say it's a bug that needs fixing.
> This one comes down to teardown_node() that uses system_or_bail(),
> leaving things unfinished.  I guess that we could be more aggressive
> and ignore failures if we have a non-zero error code and that not all
> the tests have passed within the END block of PostgresNode.pm.




Yeah, this area needs substantial improvement. I have seen similar sorts
of nasty hangs, where the script is waiting forever for some process
that hasn't got the shutdown message. At least we probably need some way
of making sure the END handler doesn't abort early. Maybe
PostgresNode::stop() needs a mode that handles failure more gracefully.
Maybe it needs to try shutting down all the nodes and only calling
BAIL_OUT after trying all of them and getting a failure. But that might
still leave us work to do on failures occuring pre-END.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS

Reply via email to