Hi, cfbot, the buildfarm and locally I have seen 100_bugs.pl fail occasionally. Just rarely enough that I never got around to looking into it for real.
Just now there was another failure on master: https://cirrus-ci.com/task/5279589287591936 [01:00:49.441] ok 1 - index predicates do not cause crash [01:00:49.441] ok 2 - update to temporary table without replica identity with FOR ALL TABLES publication [01:00:49.441] ok 3 - update to unlogged table without replica identity with FOR ALL TABLES publication [01:00:49.441] # test failed [01:00:49.441] --- stderr --- [01:00:49.441] # poll_query_until timed out executing this query: [01:00:49.441] # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's'); [01:00:49.441] # expecting this output: [01:00:49.441] # t [01:00:49.441] # last actual query output: [01:00:49.441] # f [01:00:49.441] # with stderr: [01:00:49.441] # Tests were run but no plan was declared and done_testing() was not seen. [01:00:49.441] # Looks like your test exited with 29 just after 3. [01:00:49.441] [01:00:49.441] (test program exited with status code 29) the regress log: https://api.cirrus-ci.com/v1/artifact/task/5279589287591936/testrun/build-32/testrun/subscription/100_bugs/log/regress_log_100_bugs and twoway's log: https://api.cirrus-ci.com/v1/artifact/task/5279589287591936/testrun/build-32/testrun/subscription/100_bugs/log/100_bugs_twoways.log We see t2 added to the publication: 2023-01-24 00:57:30.099 UTC [73654][client backend] [100_bugs.pl][7/5:0] LOG: statement: ALTER PUBLICATION testpub ADD TABLE t2 And that *then* "t" was synchronized: 2023-01-24 00:57:30.102 UTC [73640][logical replication worker] LOG: logical replication table synchronization worker for subscription "testsub", table "t" has finished and then that the refresh was issued: 2023-01-24 00:57:30.128 UTC [73657][client backend] [100_bugs.pl][5/10:0] LOG: statement: ALTER SUBSCRIPTION testsub REFRESH PUBLICATION And we see a walsender starting and the query to get the new tables being executed: 2023-01-24 00:57:30.139 UTC [73660][walsender] [testsub][6/8:0] LOG: statement: SELECT DISTINCT t.schemaname, t.tablename , t.attnames FROM pg_catalog.pg_publication_tables t WHERE t.pubname IN ('testpub') And that's it, the rest of the time is just polling. Perhaps wait_for_subscription_sync() should dump the set of rel states to make something like this more debuggable? The fact that the synchronization for t finished just before the refresh makes me wonder if a wakeup or a cache invalidation got lost? Greetings, Andres Freund