On Tue, Aug 11, 2020 at 2:55 AM Masahiko Sawada <masahiko.saw...@2ndquadrant.com> wrote: > > On Tue, 11 Aug 2020 at 07:56, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > > > > Last week, James reported to us that after promoting a replica, some > > seqscan was taking a huge amount of time; on investigation he saw that > > there was a high rate of FPI_FOR_HINT wal messages by the seqscan. > > Looking closely at the generated traffic, HEAP_XMIN_COMMITTED was being > > set on some tuples. > > > > Now this may seem obvious to some as a drawback of the current system, > > but I was taken by surprise. The problem was simply that when a page is > > examined by a seqscan, we do HeapTupleSatisfiesVisibility of each tuple > > in isolation; and for each tuple we call SetHintBits(). And only the > > first time the FPI happens; by the time we get to the second tuple, the > > page is already dirty, so there's no need to emit an FPI. But the FPI > > we sent only had the bit on the first tuple ... so the standby will not > > have the bit set for any subsequent tuple. And on promotion, the > > standby will have to have the bits set for all those tuples, unless you > > happened to dirty the page again later for other reasons. > > > > So if you have some table where tuples gain hint bits in bulk, and > > rarely modify the pages afterwards, and promote before those pages are > > frozen, then you may end up with a massive amount of pages that will > > need hinting after the promote, which can become troublesome. > > Did the case you observed not use hot standby? I thought the impact of > this issue could be somewhat alleviated in hot standby cases since > read queries on the hot standby can set hint bits.
We do have hot standby enabled, and there are sometimes large queries that may do seq scans that run against a replica, but there are multiple replicas (and each one would have to have the bits set), and a given replica that gets promoted in our topology isn't guaranteed to be one that's seen those reads. James