On Wed, Apr 18, 2018 at 7:37 AM, Wood, Dan <hexp...@amazon.com> wrote:
> > > My analysis is that heap_prepare_freeze_tuple->FreezeMultiXactId() > returns FRM_NOOP if the MultiXACT locked rows haven't committed. This > results in changed=false and totally_frozen=true(as initialized). When > this returns to lazy_scan_heap(), no rows are added to the frozen[] array. > Yet, tuple_totally_frozen is true. This means the page is marked frozen in > the VM, even though the MultiXACT row wasn't left untouch. > > A fix to heap_prepare_freeze_tuple() that seems to do the trick is: > else > { > Assert(flags & FRM_NOOP); > + totally_frozen = false; > } > That's a great find! This can definitely lead to various problems and could be one of the reasons behind the issue reported here [1]. For example, if we change the script slightly at the end, we can get the same error reported in the bug report. sleep 4; # Wait for share locks to be released # See if another vacuum freeze advances relminmxid beyond xmax present in the # heap echo "vacuum (verbose, freeze) t;" | $p echo "select pg_check_frozen('t');" | $p # See if a vacuum freeze scanning all pages corrects the problem echo "vacuum (verbose, freeze, disable_page_skipping) t;" | $p echo "select pg_check_frozen('t');" | $p Thanks, Pavan [1] https://www.postgresql.org/message-id/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K%2BA%40mail.gmail.com -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services