On Mon, Jun 24, 2024 at 4:27 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > On 21/06/2024 03:02, Peter Geoghegan wrote: > > On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman > > <melanieplage...@gmail.com> wrote: > > > >> The repro forces a round of index vacuuming after the standby > >> reconnects and before pruning a dead tuple whose xmax is older than > >> OldestXmin. > >> > >> At the end of the round of index vacuuming, _bt_pendingfsm_finalize() > >> calls GetOldestNonRemovableTransactionId(), thereby updating the > >> backend's GlobalVisState and moving maybe_needed backwards. > > > > Right. I saw details exactly consistent with this when I used GDB > > against a production instance. > > > > I'm glad that you were able to come up with a repro that involves > > exactly the same basic elements, including index page deletion. > > Would it be possible to make it robust so that we could always run it > with "make check"? This seems like an important corner case to > regression test.
I'd have to look into how to ensure I can stabilize some of the parts that seem prone to flaking. I can probably stabilize the vacuum bit with a query of pg_stat_activity making sure it is waiting to acquire the cleanup lock. I don't, however, see a good way around the large amount of data required to trigger more than one round of index vacuuming. I could generate the data more efficiently than I am doing here (generate_series() in the from clause). Perhaps with a copy? I know it is too slow now to go in an ongoing test, but I don't have an intuition around how fast it would have to be to be acceptable. Is there a set of additional tests that are slower that we don't always run? I didn't follow how the wraparound test ended up, but that seems like one that would have been slow. - Melanie