Page freezing, FSM, and WAL replay

Alvaro Herrera Thu, 02 Aug 2018 18:20:47 -0700

We recently had a customer report a very strange problem, involving a
very large insert-only table: without explanation, insertions would
stall for several seconds, causing application timeout and process
accumulation and other nastiness.


After some investigation, we narrowed this down to happening immediately
after the first VACUUM on the table right after a standby got promoted.
It wasn't at first obvious what the connection between these factors
was, but eventually we realized that VACUUM must have been skipping a
bunch of pages because they had been marked all-frozen previously, so
the FSM was not updated with the correct freespace figures for those
pages.  The FSM pages had been transmitted as full-page images on WAL
before the promotion (because wal_log_hints), so they contained
optimistic numbers on amount of free space coming from the previous
master.  (Because this only happens on the first change to that FSM page
after a checkpoint, it's quite likely that one page every few thousand
or so contains optimistic figures while the others remain all zeroes, or
something like that.)

Before VACUUM, nothing too bad would happen, because the upper layers of
the FSM would not know about those optimistic numbers.  But when VACUUM
does FreeSpaceMapVacuum, it propagates those numbers upwards; as soon as
that happens, inserters looking for pages would be told about those
pages (wrongly catalogued to contain sufficient free space), go to
insert there, and fail because there isn't actually any freespace; ask
FSM for another page, lather, rinse, repeat until all those pages are
all catalogued correctly by FSM, at which point things continue
normally.  (There are many processes doing this chase-up concurrently
and it seems a pretty contentious process, about which see last
paragraph; it can be seen in pg_xlogdump that it takes several seconds
for things to settle).

After considering several possible solutions, I propose to have
heap_xlog_visible compute free space for any page being marked frozen;
Pavan adds to that to have heap_xlog_clean compute free space for all
pages also.  This means that if we later promote this standby and VACUUM
skips all-frozen pages, their FSM numbers are going to be up-to-date
anyway.  Patch attached.


Now, it's possible that the problem occurs for all-visible pages not
just all-frozen.  I haven't seen that one, maybe there's some reason why
it cannot.  But fixing both things together is an easy change in the
proposed patch: just do it on xlrec->flags != 0 rather than checking for
the specific all-frozen flag.

(This problem seems to be made worse by the fact that
RecordAndGetPageWithFreeSpace (or rather fsm_set_and_search) holds
exclusive lock on the FSM page for the whole duration of update plus
search.  So when there are many inserters, they all race to the update
process.  Maybe it'd be less terrible if we would release exclusive
after the update and grab shared lock for the search in
fsm_set_and_search, but we still have to have the exclusive for the
update, so the contention point remains.  Maybe there's not sufficient
improvement to make a practical difference, so I'm not proposing
changing this.)

-- 
Álvaro Herrera

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5016181fd7..d024b4fa59 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8056,6 +8056,7 @@ heap_xlog_clean(XLogReaderState *record)
        xl_heap_clean *xlrec = (xl_heap_clean *) XLogRecGetData(record);
        Buffer          buffer;
        Size            freespace = 0;
+       bool            know_freespace = false;
        RelFileNode rnode;
        BlockNumber blkno;
        XLogRedoAction action;
@@ -8107,8 +8108,6 @@ heap_xlog_clean(XLogReaderState *record)
                                                                nowdead, ndead,
                                                                nowunused, 
nunused);
 
-               freespace = PageGetHeapFreeSpace(page); /* needed to update FSM 
below */
-
                /*
                 * Note: we don't worry about updating the page's prunability 
hints.
                 * At worst this will cause an extra prune cycle to occur soon.
@@ -8118,16 +8117,16 @@ heap_xlog_clean(XLogReaderState *record)
                MarkBufferDirty(buffer);
        }
        if (BufferIsValid(buffer))
+       {
+               freespace = PageGetHeapFreeSpace(page); /* needed to update FSM 
below */
+               know_freespace = true;
                UnlockReleaseBuffer(buffer);
+       }
 
        /*
-        * Update the FSM as well.
-        *
-        * XXX: Don't do this if the page was restored from full page image. We
-        * don't bother to update the FSM in that case, it doesn't need to be
-        * totally accurate anyway.
+        * Update the FSM as well, if we can.
         */
-       if (action == BLK_NEEDS_REDO)
+       if (know_freespace)
                XLogRecordPageWithFreeSpace(rnode, blkno, freespace);
 }
 
@@ -8149,6 +8148,8 @@ heap_xlog_visible(XLogReaderState *record)
        Page            page;
        RelFileNode rnode;
        BlockNumber blkno;
+       Size            space;
+       bool            know_freespace = false;
        XLogRedoAction action;
 
        XLogRecGetBlockTag(record, 1, &rnode, NULL, &blkno);
@@ -8201,8 +8202,31 @@ heap_xlog_visible(XLogReaderState *record)
                 * wal_log_hints enabled.)
                 */
        }
+
        if (BufferIsValid(buffer))
+       {
+               space = PageGetFreeSpace(BufferGetPage(buffer));        /* for 
later */
+               know_freespace = true;
                UnlockReleaseBuffer(buffer);
+       }
+
+       /*
+        * Since FSM is not WAL-logged and only updated heuristicaly, it easily
+        * becomes stale in standbys.  If the standby is later promoted and runs
+        * VACUUM, it will skip updating individual free space figures for pages
+        * that became frozen, which is troublesome when FreeSpaceMapVacuum
+        * propagates too optimistic free space values to upper FSM layers; 
later
+        * inserters try to use such pages only to find out that they are
+        * unusable.  This can cause long stalls when there are many such pages.
+        *
+        * Forestall those problems by updating FSM's idea about a page that is
+        * becoming frozen.
+        *
+        * Do this regardless of full-page image being applied, since the FSM 
data
+        * is not in the page anyway.
+        */
+       if ((xlrec->flags & VISIBILITYMAP_ALL_FROZEN) && know_freespace)
+               XLogRecordPageWithFreeSpace(rnode, blkno, space);
 
        /*
         * Even if we skipped the heap page update due to the LSN interlock, 
it's

Page freezing, FSM, and WAL replay

Reply via email to