Re: recovering from "found xmin ... from before relfrozenxid ..."

Rajkumar Raghuwanshi Thu, 06 Aug 2020 01:57:04 -0700

I have been doing some user-level testing of this feature, apart from
sanity test for extension and it's functions


I have tried to corrupt tuples and then able to fix it using
heap_force_freeze/kill functions.


--corrupt relfrozenxid, cause vacuum failed.

update pg_class set relfrozenxid = (relfrozenxid::text::integer +
10)::text::xid where relname = 'test_tbl';

UPDATE 1

insert into test_tbl values (2, 'BBB');


postgres=# vacuum test_tbl;

ERROR:  found xmin 507 from before relfrozenxid 516

CONTEXT:  while scanning block 0 of relation "public.test_tbl"


postgres=# select *, ctid, xmin, xmax from test_tbl;

 a |  b  | ctid  | xmin | xmax

---+-----+-------+------+------

 1 | AAA | (0,1) |  505 |    0

 2 | BBB | (0,2) |  507 |    0

(2 rows)


--fixed using heap_force_freeze

postgres=# select heap_force_freeze('test_tbl'::regclass,
ARRAY['(0,2)']::tid[]);

 heap_force_freeze

-------------------


postgres=# vacuum test_tbl;

VACUUM

postgres=# select *, ctid, xmin, xmax from test_tbl;

 a |  b  | ctid  | xmin | xmax

---+-----+-------+------+------

 1 | AAA | (0,1) |  505 |    0

 2 | BBB | (0,2) |    2 |    0

(2 rows)


--corrupt table headers in base/oid. file, cause table access failed.

postgres=# select ctid, * from test_tbl;

ERROR:  could not access status of transaction 4294967295

DETAIL:  Could not open file "pg_xact/0FFF": No such file or directory.


--removed corrupted tuple using heap_force_kill

postgres=# select heap_force_kill('test_tbl'::regclass,
ARRAY['(0,2)']::tid[]);

 heap_force_kill

-----------------



(1 row)


postgres=# select ctid, * from test_tbl;

 ctid  | a |  b

-------+---+-----

 (0,1) | 1 | AAA

(1 row)


I will be continuing with my testing with the latest patch and update here
if found anything.


Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Aug 6, 2020 at 1:42 PM Masahiko Sawada <
[email protected]> wrote:

> On Wed, 5 Aug 2020 at 22:42, Ashutosh Sharma <[email protected]>
> wrote:
> >
> > Hi Robert,
> >
> > Thanks for the review. Please find my comments inline:
> >
> > On Sat, Aug 1, 2020 at 12:18 AM Robert Haas <[email protected]>
> wrote:
> > >
> > > On Fri, Jul 31, 2020 at 8:52 AM Ashutosh Sharma <[email protected]>
> wrote:
> > > > Attached is the new version of patch that addresses the comments
> from Andrey and Beena.
> > >
> > > +PGFILEDESC = "pg_surgery - perform surgery on the damaged heap table"
> > >
> > > the -> a
> > >
> > > I also suggest: heap table -> relation, because we might want to
> > > extend this to other cases later.
> > >
> >
> > Corrected.
> >
> > > +-- toast table.
> > > +begin;
> > > +create table ttab(a text);
> > > +insert into ttab select string_agg(chr(floor(random() * 26)::int +
> > > 65), '') from generate_series(1,10000);
> > > +select * from ttab where xmin = 2;
> > > + a
> > > +---
> > > +(0 rows)
> > > +
> > > +select heap_force_freeze('ttab'::regclass, ARRAY['(0, 1)']::tid[]);
> > > + heap_force_freeze
> > > +-------------------
> > > +
> > > +(1 row)
> > > +
> > >
> > > I don't understand the point of this. You're not testing the function
> > > on the TOAST table; you're testing it on the main table when there
> > > happens to be a TOAST table that is probably getting used for
> > > something. But that's not really relevant to what is being tested
> > > here, so as written this seems redundant with the previous cases.
> > >
> >
> > Yeah, it's being tested on the main table, not on a toast table. I've
> > removed this test-case and also restricted direct access to the toast
> > table using heap_force_kill/freeze functions. I think we shouldn't be
> > using these functions to do any changes in the toast table. We will
> > only use these functions with the main table and let VACUUM remove the
> > corresponding data chunks (pointed by the tuple that got removed from
> > the main table).
> >
> > Another option would be to identify all the data chunks corresponding
> > to the tuple (ctid) being killed from the main table and remove them
> > one by one. We will only do this if the tuple from the main table that
> > has been marked as killed has an external storage. We will have to add
> > a bunch of code for this otherwise we can let VACUUM do this for us.
> > Let me know your thoughts on this.
> >
> > > +-- test pg_surgery functions with the unsupported relations. Should
> fail.
> > >
> > > Please name the specific functions being tested here in case we add
> > > more in the future that are tested separately.
> > >
> >
> > Done.
> >
> > > +++ b/contrib/pg_surgery/heap_surgery_funcs.c
> > >
> > > I think we could drop _funcs from the file name.
> > >
> >
> > Done.
> >
> > > +#ifdef PG_MODULE_MAGIC
> > > +PG_MODULE_MAGIC;
> > > +#endif
> > >
> > > The #ifdef here is not required, and if you look at other contrib
> > > modules you'll see that they don't have it.
> > >
> >
> > Okay, done.
> >
> > > I don't like all the macros at the top of the file much. CHECKARRVALID
> > > is only used in one place, so it seems to me that you might as well
> > > just inline it and lose the macro. Likewise for SORT and ARRISEMPTY.
> > >
> >
> > Done.
> >
> > > Once you do that, heap_force_common() can just compute the number of
> > > array elements once, instead of doing it once inside ARRISEMPTY, then
> > > again inside SORT, and then a third time to initialize ntids. You can
> > > just have a local variable in that function that is set once and then
> > > used as needed. Then you are only doing ARRNELEMS once, so you can get
> > > rid of that macro too. The same technique can be used to get rid of
> > > ARRPTR. So then all the macros go away without introducing any code
> > > duplication.
> > >
> >
> > Done.
> >
> > > +/* Options to forcefully change the state of a heap tuple. */
> > > +typedef enum HTupleForceOption
> > > +{
> > > + FORCE_KILL,
> > > + FORCE_FREEZE
> > > +} HTupleForceOption;
> > >
> > > I suggest un-abbreviating HTuple to HeapTuple and un-abbreviating the
> > > enum members to HEAP_FORCE_KILL and HEAP_FORCE_FREE.
> >
> > Done.
> >
> > Also, how about
> > > option -> operation?
> > >
> >
> > I think both look okay to me.
> >
> > > + return heap_force_common(fcinfo, FORCE_KILL);
> > >
> > > I think it might be more idiomatic to use PG_RETURN_DATUM here.  I
> > > know it's the same thing, though, and perhaps I'm even wrong about the
> > > prevailing style.
> > >
> >
> > Done.
> >
> > > + Assert(force_opt == FORCE_KILL || force_opt == FORCE_FREEZE);
> > >
> > > I think this is unnecessary. It's an enum with 2 values.
> > >
> >
> > Removed.
> >
> > > + if (ARRISEMPTY(ta))
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > > + errmsg("empty tid array")));
> > >
> > > I don't see why this should be an error. Why can't we just continue
> > > normally and if it does nothing, it does nothing? You'd need to change
> > > the do..while loop to a while loop so that the end condition is tested
> > > at the top, but that seems fine.
> > >
> >
> > I think it's okay to have this check. So, just left it as-is. We do
> > have such checks in other contrib modules as well wherever the array
> > is being passed as an input to the function.
> >
> > > + rel = relation_open(relid, AccessShareLock);
> > >
> > > Maybe we should take RowExclusiveLock, since we are going to modify
> > > rows. Not sure how much it matters, though.
> > >
> >
> > I don't know how it would make a difference, but still as you said
> > replaced AccessShare with RowExclusive.
> >
> > > + if (!superuser() && GetUserId() != rel->rd_rel->relowner)
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > > + errmsg("must be superuser or object owner to use %s.",
> > > + force_opt == FORCE_KILL ? "heap_force_kill()" :
> > > + "heap_force_freeze()")));
> > >
> > > This is the wrong way to do a permissions check, and it's also the
> > > wrong way to write an error message about having failed one. To see
> > > the correct method, grep for pg_class_aclcheck. However, I think that
> > > we shouldn't in general trust the object owner to do this, unless the
> > > super-user gave permission. This is a data-corrupting operation, and
> > > only the boss is allowed to authorize it. So I think you should also
> > > add REVOKE EXECUTE FROM PUBLIC statements to the SQL file, and then
> > > have this check as a backup. Then, the superuser is always allowed,
> > > and if they choose to GRANT EXECUTE on this function to some users,
> > > those users can do it for their own relations, but not others.
> > >
> >
> > Done.
> >
> > > + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > + errmsg("only heap AM is supported")));
> > > +
> > > + check_relation_relkind(rel);
> > >
> > > Seems like these checks are in the wrong order.
> >
> > I don't think there is anything wrong with the order. I can see the
> > same order in other contrib modules as well.
> >
> > Also, maybe you could
> > > call the function something like check_relation_ok() and put the
> > > permissions test, the relkind test, and the relam test all inside of
> > > it, just to tighten up the code in this main function a bit.
> > >
> >
> > Yeah, I've added a couple of functions named sanity_check_relation and
> > sanity_check_tid_array and shifted all the sanity checks there.
> >
> > > + if (noffs > maxoffset)
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > > + errmsg("number of offsets specified for block %u exceeds the max
> > > offset number %u",
> > > + blkno, maxoffset)));
> > >
> > > Hmm, this doesn't seem quite right. The actual problem is if an
> > > individual item pointer's offset number is greater than maxoffset,
> > > which can be true even if the total number of offsets is less than
> > > maxoffset. So I think you need to remove this check and add a check
> > > inside the loop which follows that offnos[i] is in range.
> > >
> >
> > Agreed and done.
> >
> > > The way you've structured that loop is actually problematic -- I don't
> > > think we want to be calling elog() or ereport() inside a critical
> > > section. You could fix the case that checks for an invalid force_opt
> > > by just doing if (op == HEAP_FORCE_KILL) { ... } else { Assert(op ==
> > > HEAP_FORCE_FREEZE); ... }, or by using a switch with no default. The
> > > NOTICE case you have here is a bigger problem.
> >
> > Done.
> >
> > You really cannot
> > > modify the buffer like this and then decide, oops, never mind, I think
> > > I won't mark it dirty or write WAL for the changes. If you do that,
> > > the buffer is still in memory, but it's now been modified. A
> > > subsequent operation that modifies it will start with the altered
> > > state you created here, quite possibly leading to WAL that cannot be
> > > correctly replayed on the standby. In other words, you've got to
> > > decide for certain whether you want to proceed with the operation
> > > *before* you enter the critical section. You also need to emit any
> > > messages before or after the critical section.  So you could:
> > >
> >
> > This is still not clear. I think Robert needs to respond to my earlier
> comment.
> >
> > > I believe this violates our guidelines on message construction. Have
> > > two completely separate messages -- and maybe lose the word "already":
> > >
> > > "skipping tid (%u, %u) because it is dead"
> > > "skipping tid (%u, %u) because it is unused"
> > >
> > > The point of this is that it makes it easier for translators.
> > >
> >
> > Done.
> >
> > > I see very little point in what verify_tid() is doing. Before using
> > > each block number, we should check that it's less than or equal to a
> > > cached value of RelationGetNumberOfBlocks(rel). That's necessary in
> > > any case to avoid funny errors; and then the check here against
> > > specifically InvalidBlockNumber is redundant. For the offset number,
> > > same thing: we need to check each offset against the page's
> > > PageGetMaxOffsetNumber(page); and if we do that then we don't need
> > > these checks.
> > >
> >
> > Done.
> >
> > Please check the attached patch for the changes.
>
> I also looked at this version patch and have some small comments:
>
> +   Oid             relid = PG_GETARG_OID(0);
> +   ArrayType      *ta = PG_GETARG_ARRAYTYPE_P_COPY(1);
> +   ItemPointer     tids;
> +   int             ntids;
> +   Relation        rel;
> +   Buffer          buf;
> +   Page            page;
> +   ItemId          itemid;
> +   BlockNumber     blkno;
> +   OffsetNumber   *offnos;
> +   OffsetNumber    offno,
> +                   noffs,
> +                   curr_start_ptr,
> +                   next_start_ptr,
> +                   maxoffset;
> +   int             i,
> +                   nskippedItems,
> +                   nblocks;
>
> You declare all variables at the top of heap_force_common() function
> but I think we can declare some variables such as buf, page inside of
> the do loop.
>
> ---
> +           if (offnos[i] > maxoffset)
> +           {
> +               ereport(NOTICE,
> +                        errmsg("skipping tid (%u, %u) because it
> contains an invalid offset",
> +                               blkno, offnos[i]));
> +               continue;
> +           }
>
> If all tids on a page take the above path, we will end up logging FPI
> in spite of modifying nothing on the page.
>
> ---
> +       /* XLOG stuff */
> +       if (RelationNeedsWAL(rel))
> +           log_newpage_buffer(buf, true);
>
> I think we need to set the returned LSN by log_newpage_buffer() to the
> page lsn.
>
> Regards,
>
> --
> Masahiko Sawada            http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>

Re: recovering from "found xmin ... from before relfrozenxid ..."

Reply via email to