Hi, I think I'll remove the "Catalog xmins should advance after standby logical slot fetches the changes." test. For one, it takes a long time (due to the 2000 psqls). But more importantly, it's simply not testing anything that's reliable:
1) There's no guarantee that I can see that catalog_xmin needs to increase, just because we called pg_logical_slot_get_changes() once. 2) $node_master->wait_for_catchup($node_standby, 'replay', $node_master->lsn('flush')); is definitely not OK. It only happens to work by accident / the 2000 iterations. There might not be any logical changes associated with that LSN, so there'd might not be anything to replay. That's especially true for the second wait_for_catchup - there haven't been any logical changes since the last wait. The test hangs reliably for me if I replace the 2000 with 2. Kinda looks like somebody just tried to add more and more inserts to make the test not fail, without addressing the reliability issues. That kind of thing rarely works out well, because it tends to be very machine specific to get the timing right. And it makes the test take forever. TBH, most of 024_standby_logical_decoding_xmins.pl looks like they've been minimally hacked up the tests from Craig's quite different patch, without adjusting them. There's stuff like: # Create new slots on the standby, ignoring the ones on the master completely. # # This must succeed since we know we have a catalog_xmin reservation. We # might've already sent hot standby feedback to advance our physical slot's # catalog_xmin but not received the corresponding xlog for the catalog xmin # advance, in which case we'll create a slot that isn't usable. The calling # application can prevent this by creating a temporary slot on the master to # lock in its catalog_xmin. For a truly race-free solution we'd need # master-to-standby hot_standby_feedback replies. # # In this case it won't race because there's no concurrent activity on the # master. # This issue doesn't exist in the patch. There's also no test for a recovery conflict due to row removal. Despite that being a substantial part of the patchset. I'm tempted to throw out 024 - all of its tests seem fragile and prove little. And then add a few more tests to 025 (and renaming it). Greetings, Andres Freund