I discovered $SUBJECT after wondering why hyrax hadn't reported in recently, and trying to run check-world under CCA to see if anything got stuck. Indeed it did --- although this doesn't explain the radio silence from hyrax, because that animal doesn't run any TAP tests. (Neither does avocet, which I think is the only other active CCA critter. So this could have been broken for a very long time.)
I count three distinct bugs that were exposed by this attempt: 1. In the part of 013_partition.pl that tests firing AFTER triggers on partitioned tables, we have a case of continuing to access a relcache entry that's already been closed. (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE hasn't exposed this.) It looks to me like instead we had a relcache reference leak before f3b141c48, but now, the only relcache reference count on a partition child table is dropped by ExecCleanupTupleRouting, which logical/worker.c invokes before it fires triggers on that table. Kaboom. This might go away if worker.c weren't so creatively different from the other code paths concerned with executor shutdown. 2. Said bug causes a segfault in the apply worker process. This causes the parent postmaster to give up and die. I don't understand why we don't treat that like a crash in a regular backend, considering that an apply worker is running largely user-defined code. 3. Once the subscriber1 postmaster has exited, the TAP test will eventually time out, and then this happens: timed out waiting for catchup at t/013_partition.pl line 219. ### Stopping node "publisher" using mode immediate # Running: pg_ctl -D /Users/tgl/pgsql/src/test/subscription/tmp_check/t_013_partition_publisher_data/pgdata -m immediate stop waiting for server to shut down.... done server stopped # No postmaster PID for node "publisher" ### Stopping node "subscriber1" using mode immediate # Running: pg_ctl -D /Users/tgl/pgsql/src/test/subscription/tmp_check/t_013_partition_subscriber1_data/pgdata -m immediate stop pg_ctl: PID file "/Users/tgl/pgsql/src/test/subscription/tmp_check/t_013_partition_subscriber1_data/pgdata/postmaster.pid" does not exist Is server running? Bail out! system pg_ctl failed That is, because we failed to shut down subscriber1, the test script neglects to shut down subscriber2, and now things just sit indefinitely. So that's a robustness problem in the TAP infrastructure, rather than a bug in PG proper; but I still say it's a bug that needs fixing. regards, tom lane