We've been seeing $subject since commit 1bc0100d2, with little clue as to the cause. Previous attempts to fix it by preventing autovacuum from running on the relevant tables didn't seem to help.
I have now managed to reproduce the issue reliably enough to study it. (It turns out that on longfin's host, running the buildfarm script *under cron* produces the failure a reasonable percentage of the time. The identical test case, launched from an interactive shell, never fails. I speculate that the kernel scheduler treats cron jobs differently.) It is in fact a timing issue, and what triggers it is there being an active autovacuum task in another database. The fact that the buildfarm script uses USE_MODULE_DB in the contrib tests greatly increases the surface area of the problem, since that creates a lot more databases that autovacuum could be active in. Once I realized that, I found that it can trivially be reproduced by hand, in a "make installcheck" test, simply by having an open transaction in another DB in the cluster. For instance "select pg_sleep(60);" and then run make installcheck in postgres_fdw. So now, drilling down to the specific problem: what we're seeing is that the plans for some queries change because the "remote" server reports a different rowcount estimate than usual for EXPLAIN SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE Normally the estimate is one, but in the failure cases it's 12 or so. This estimate is coming out of ineq_histogram_selectivity, of course. Furthermore, the highest value actually present in the histogram is 1000, because that was the highest value of "C 1" when the test did ANALYZE up at the top. That means we'll invoke get_actual_variable_range to try to identify the actual current maximum. Most of the time, it returns 2010, which is correct, and we end up estimating that only about one row will exceed 2000. However, in the failure cases, what's getting extracted from the index is 9999. That's because we'd inserted and deleted that value further up in the test, and if there's been an open transaction holding back RecentGlobalXmin, then SnapshotNonVacuumable is going to consider that entry still good. This value is enough larger than 2000 to move the selectivity estimate appreciably, and then kaboom. This theory successfully explains the observed fact that some buildfarm failures show differences in two query plans, while others show a difference just in the first one. In the latter cases, the external transaction ended, so that RecentGlobalXmin could advance, in between. What I propose to do to fix the instability is to change the test stanza that uses 9999 as a key-chosen-at-random to use something less far away from the normal range of "C 1" values, so that whether it's still visible to get_actual_variable_range has less effect on this selectivity estimate. That's a hack, for sure, but I don't see any other fix that's much less of a hack. regards, tom lane