On Fri, 8 Nov 2024 at 17:11, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
>
> We recently had a customer report a very strange problem, involving a
> very large insert-only table: without explanation, insertions would
> stall for several seconds, causing application timeout and process
> accumulation and other nastiness.
>
> After some investigation, we narrowed this down to happening immediately
> after the first VACUUM on the table right after a standby got promoted.
> It wasn't at first obvious what the connection between these factors
> was, but eventually we realized that VACUUM must have been skipping a
> bunch of pages because they had been marked all-frozen previously, so
> the FSM was not updated with the correct freespace figures for those
> pages.  The FSM pages had been transmitted as full-page images on WAL
> before the promotion (because wal_log_hints), so they contained
> optimistic numbers on amount of free space coming from the previous
> master.  (Because this only happens on the first change to that FSM page
> after a checkpoint, it's quite likely that one page every few thousand
> or so contains optimistic figures while the others remain all zeroes, or
> something like that.)
>
> Before VACUUM, nothing too bad would happen, because the upper layers of
> the FSM would not know about those optimistic numbers.  But when VACUUM
> does FreeSpaceMapVacuum, it propagates those numbers upwards; as soon as
> that happens, inserters looking for pages would be told about those
> pages (wrongly catalogued to contain sufficient free space), go to
> insert there, and fail because there isn't actually any freespace; ask
> FSM for another page, lather, rinse, repeat until all those pages are
> all catalogued correctly by FSM, at which point things continue
> normally.  (There are many processes doing this chase-up concurrently
> and it seems a pretty contentious process, about which see last
> paragraph; it can be seen in pg_xlogdump that it takes several seconds
> for things to settle).
>
> After considering several possible solutions, I propose to have
> heap_xlog_visible compute free space for any page being marked frozen;
> Pavan adds to that to have heap_xlog_clean compute free space for all
> pages also.  This means that if we later promote this standby and VACUUM
> skips all-frozen pages, their FSM numbers are going to be up-to-date
> anyway.  Patch attached.
>
>
> Now, it's possible that the problem occurs for all-visible pages not
> just all-frozen.  I haven't seen that one, maybe there's some reason why
> it cannot.  But fixing both things together is an easy change in the
> proposed patch: just do it on xlrec->flags != 0 rather than checking for
> the specific all-frozen flag.
>
> (This problem seems to be made worse by the fact that
> RecordAndGetPageWithFreeSpace (or rather fsm_set_and_search) holds
> exclusive lock on the FSM page for the whole duration of update plus
> search.  So when there are many inserters, they all race to the update
> process.  Maybe it'd be less terrible if we would release exclusive
> after the update and grab shared lock for the search in
> fsm_set_and_search, but we still have to have the exclusive for the
> update, so the contention point remains.  Maybe there's not sufficient
> improvement to make a practical difference, so I'm not proposing
> changing this.)
>
> --
> Álvaro Herrera

Hi!
Sorry for disturbing you after so much time. Today, while I was doing
my stuff and researching several FSM-related questions, I noticed that
the comment in the `heap_xlog_visible` function used improper
punctuation.
After some investigation, I conclude that this is an oversight of
ab7dbd6, which was proposed in this thread.

I'd like to propose a fix for that.

Sorry for making so much noise for this minor matter.

-- 
Best regards,
Kirill Reshke

Attachment: v1-0001-Fixup-FSM-comment-inside-heap_xlog_visible.patch
Description: Binary data

Reply via email to