Re: recovering from "found xmin ... from before relfrozenxid ..."

Ashutosh Sharma Tue, 18 Aug 2020 01:17:22 -0700

Hello Masahiko-san,

I've spent some more time trying to understand the code in
lazy_scan_heap function to know under what all circumstances a VACUUM
can fail with "found xmin ... before relfrozenxid ..." error for a
tuple whose xmin is behind relfrozenxid. Here are my observations:


1) It can fail with this error for a live tuple

OR,

2) It can also fail with this error if a tuple (that went through
update) is marked as HEAP_HOT_UPDATED or HEAP_ONLY_TUPLE.

OR,

3) If there are any concurrent transactions, then the tuple might be
marked as HEAPTUPLE_INSERT_IN_PROGRESS or HEAPTUPLE_DELETE_IN_PROGRESS
or HEAPTUPLE_RECENTLY_DEAD in which case also VACUUM can fail with
this error.

Now, AFAIU, as we will be dealing with a damaged table, the chances of
point #3 being the cause of this error looks impossible in our case
because I don't think we will be doing anything in parallel when
performing surgery on a damaged table, in fact we shouldn't be doing
any such things. However, it is quite possible that reason #2 could
cause VACUUM to fail with this sort of error, but, as we are already
skipping redirected item pointers in heap_force_common(), I think, we
would never be marking HEAP_HOT_UPDATED tuple as frozen and I don't
see any problem in marking HEAP_ONLY_TUPLE as frozen. So, probably, we
may not need to handle point #2 as well.

Further, I also don't see VACUUM reporting this error for a tuple that
has been moved from one partition to another. So, I think we might not
need to do any special handling for a tuple that got updated and its
new version was moved to another partition.

If you feel I am missing something here, please correct me. Thank you.

Moreover, while I was exploring on above, I noticed that in
lazy_scan_heap(), before we call HeapTupleSatisfiesVacuum() we check
for a redirected item pointers and if any redirected item pointer is
detected we do not call HeapTupleSatisfiesVacuum(). So, not sure how
HeapTupleSatisfiesVacuum would ever return a dead tuple that is marked
with HEAP_HOT_UPDATED. I am referring to the following code in
lazy_scan_heap().

        for (offnum = FirstOffsetNumber;
             offnum <= maxoff;
             offnum = OffsetNumberNext(offnum))
        {
            ItemId      itemid;

            itemid = PageGetItemId(page, offnum);

.............
.............


            /* Redirect items mustn't be touched */ <-- this check
would bypass the redirected item pointers from being checked for
HeapTupleSatisfiesVacuum.
            if (ItemIdIsRedirected(itemid))
            {
                hastup = true;  /* this page won't be truncatable */
                continue;
            }

..............
..............

            switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
            {
                case HEAPTUPLE_DEAD:

                    if (HeapTupleIsHotUpdated(&tuple) ||
                        HeapTupleIsHeapOnly(&tuple) ||
                        params->index_cleanup == VACOPT_TERNARY_DISABLED)
                        nkeep += 1;
                    else
                        tupgone = true; /* we can delete the tuple */
..............
..............
             }


So, the point is, would HeapTupleIsHotUpdated(&tuple) ever be true?

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Mon, Aug 17, 2020 at 11:35 AM Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
>
> Hello Masahiko-san,
>
> Thanks for the review. Please check the comments inline below:
>
> On Fri, Aug 14, 2020 at 10:07 AM Masahiko Sawada
> <masahiko.saw...@2ndquadrant.com> wrote:
>
> > Thank you for updating the patch! Here are my comments on v5 patch:
> >
> > --- a/contrib/Makefile
> > +++ b/contrib/Makefile
> > @@ -35,6 +35,7 @@ SUBDIRS = \
> >         pg_standby  \
> >         pg_stat_statements \
> >         pg_trgm     \
> > +       pg_surgery  \
> >         pgcrypto    \
> >
> > I guess we use alphabetical order here. So pg_surgery should be placed
> > before pg_trgm.
> >
>
> Okay, will take care of this in the next version of patch.
>
> > ---
> > +           if (heap_force_opt == HEAP_FORCE_KILL)
> > +               ItemIdSetDead(itemid);
> >
> > I think that if the page is an all-visible page, we should clear an
> > all-visible bit on the visibility map corresponding to the page and
> > PD_ALL_VISIBLE on the page header. Otherwise, index only scan would
> > return the wrong results.
> >
>
> I think we should let VACUUM do that. Please note that this module is
> intended to be used only on a damaged relation and should only be
> operated on damaged tuples of such relations. And the execution of any
> of the functions provided by this module on a damaged relation must be
> followed by VACUUM with DISABLE_PAGE_SKIPPING option on that relation.
> This is necessary to bring back a damaged relation to the sane state
> once a surgery is performed on it. I will try to add this note in the
> documentation for this module.
>
> > ---
> > +       /*
> > +        * We do not mark the buffer dirty or do WAL logging for unmodifed
> > +        * pages.
> > +        */
> > +       if (!did_modify_page)
> > +           goto skip_wal;
> > +
> > +       /* Mark buffer dirty before we write WAL. */
> > +       MarkBufferDirty(buf);
> > +
> > +       /* XLOG stuff */
> > +       if (RelationNeedsWAL(rel))
> > +           log_newpage_buffer(buf, true);
> > +
> > +skip_wal:
> > +       END_CRIT_SECTION();
> > +
> >
> > s/unmodifed/unmodified/
> >
>
> okay, will fix this typo.
>
> > Do we really need to use goto? I think we can modify it like follows:
> >
> >     if (did_modity_page)
> >     {
> >        /* Mark buffer dirty before we write WAL. */
> >        MarkBufferDirty(buf);
> >
> >        /* XLOG stuff */
> >        if (RelationNeedsWAL(rel))
> >            log_newpage_buffer(buf, true);
> >     }
> >
> >     END_CRIT_SECTION();
> >
>
> No, we don't need it. We can achieve the same by checking the status
> of did_modify_page flag as you suggested. I will do this change in the
> next version.
>
> > ---
> > pg_force_freeze() can revival a tuple that is already deleted but not
> > vacuumed yet. Therefore, the user might need to reindex indexes after
> > using that function. For instance, with the following script, the last
> > two queries: index scan and seq scan, will return different results.
> >
> > set enable_seqscan to off;
> > set enable_bitmapscan to off;
> > set enable_indexonlyscan to off;
> > create table tbl (a int primary key);
> > insert into tbl values (1);
> >
> > update tbl set a = a + 100 where a = 1;
> >
> > explain analyze select * from tbl where a < 200;
> >
> > -- revive deleted tuple on heap
> > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> >
> > -- index scan returns 2 tuples
> > explain analyze select * from tbl where a < 200;
> >
> > -- seq scan returns 1 tuple
> > set enable_seqscan to on;
> > explain analyze select * from tbl;
> >
>
> I am not sure if this is the right use-case of pg_force_freeze
> function. I think we should only be running pg_force_freeze function
> on a tuple for which VACUUM reports "found xmin ABC from before
> relfrozenxid PQR" sort of error otherwise it might worsen the things
> instead of making it better. Now, the question is - can VACUUM report
> this type of error for a deleted tuple or it would only report it for
> a live tuple? AFAIU this won't be reported for the deleted tuples
> because VACUUM wouldn't consider freezing a tuple that has been
> deleted.
>
> > Also, if a tuple updated and moved to another partition is revived by
> > heap_force_freeze(), its ctid still has special values:
> > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> > see a problem yet caused by a visible tuple having the special ctid
> > value, but it might be worth considering either to reset ctid value as
> > well or to not freezing already-deleted tuple.
> >
>
> For this as well, the answer remains the same as above.
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com

Re: recovering from "found xmin ... from before relfrozenxid ..."

Reply via email to