For the past five days or so, wrasse has been intermittently failing due to unexpectedly not using an Index Only Scan plan in the create_index test [1], eg
@@ -1910,11 +1910,15 @@ SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) ORDER BY unique1; - QUERY PLAN -------------------------------------------------------- - Index Only Scan using tenk1_unique1 on tenk1 - Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) -(2 rows) + QUERY PLAN +------------------------------------------------------------------- + Sort + Sort Key: unique1 + -> Bitmap Heap Scan on tenk1 + Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[])) + -> Bitmap Index Scan on tenk1_unique1 + Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) +(6 rows) SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) The most probable explanation for this seems to be that tenk1's pg_class.relallvisible value hasn't been set high enough to make an IOS look cheaper than the alternatives. Where that ought to be getting set is the "VACUUM ANALYZE tenk1" step in test_setup.sql. It's plausible I guess that a background autovacuum is preventing that command from setting relallvisible as high as it ought to be --- but if so, why are we only seeing two plans changing, on only one animal? But what I'm really confused about is that this test arrangement has been stable since early February. Why has wrasse suddenly started showing a 25% failure rate when it never failed this way before that? Somebody has to have recently committed a change that affects this. Checking the commit log up to the onset of the failures on 8 April, I only see two plausible candidates: * shared-memory pgstats * Peter's recent VACUUM changes Any connection to pgstats is, um, pretty obscure. I'd finger the VACUUM changes as a more likely trigger except that the last interesting-looking one was f3c15cbe5 on 3 April, and wrasse got through "make check" 38 times after that before its first failure of this kind. That doesn't square with the 25% failure rate since then, so I'm kind of forced to the conclusion that the pgstats work changed some behavior that it should not have. Any ideas? I'm tempted to add something like SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1'; so that we can confirm or refute the theory that relallvisible is the driving factor. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-08%2003%3A48%3A30