Hi,

On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
> Andres Freund <and...@anarazel.de> writes:
> > My point was that given the current coding the code in
> > ATExecSetTableSpace() would make changes to the *old* relfilenode, after
> > having already copied the contents to the new relfilenode. Which means
> > that if ATExecSetTableSpace() is ever used on pg_class or one of it's
> > indexes, it'd just loose those changes, afaict.
> 
> Hmm.

I think there's no a live bug in because we a) require
allow_system_table_mods to modify system tables, and then b) have
another check

    /*
     * We cannot support moving mapped relations into different tablespaces.
     * (In particular this eliminates all shared catalogs.)
     */
    if (RelationIsMapped(rel))
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("cannot move system relation \"%s\"",
                        RelationGetRelationName(rel))));

that triggers even when allow_system_table_mods is off.


> There's another reason why we'd like the relcache contents to only change
> at CCI, which is that if we get a relcache invalidation somewhere before
> we get to the CCI, relcache.c would proceed to reload it based on the
> *current* catalog contents (ie, pre-update, thanks to the magic of MVCC),
> so that the entry would revert back to its previous state until we did get
> to CCI.  I wonder whether there's any demonstrable bug there.  Though
> you'd think the CLOBBER_CACHE_ALWAYS animals would've found it if so.

I think we basically assume that there's nothing triggering relcache
invals here inbetween updating the relcache entry, and doing the actual
catalog modification. That looks like it's currently somewhat OK in the
places we've talked about so far.

This made me look at cluster.c and damn, I'd forgotten about that.
Look at the following code in copy_table_data():

                /*
                 * When doing swap by content, any toast pointers written into 
NewHeap
                 * must use the old toast table's OID, because that's where the 
toast
                 * data will eventually be found.  Set this up by setting 
rd_toastoid.
                 * This also tells toast_save_datum() to preserve the toast 
value
                 * OIDs, which we want so as not to invalidate toast pointers in
                 * system catalog caches, and to avoid making multiple copies 
of a
                 * single toast value.
                 *
                 * Note that we must hold NewHeap open until we are done 
writing data,
                 * since the relcache will not guarantee to remember this 
setting once
                 * the relation is closed.  Also, this technique depends on the 
fact
                 * that no one will try to read from the NewHeap until after 
we've
                 * finished writing it and swapping the rels --- otherwise they 
could
                 * follow the toast pointers to the wrong place.  (It would 
actually
                 * work for values copied over from the old toast table, but 
not for
                 * any values that we toast which were previously not toasted.)
                 */
                NewHeap->rd_toastoid = OldHeap->rd_rel->reltoastrelid;
        }
        else
                *pSwapToastByContent = false;

which then goes on to do things like a full blown sort or index
scan. Sure, that's for the old relation, but that's so ugly. It works
because RelationClearRelation() copies over rd_toastoid :/.

Greetings,

Andres Freund


Reply via email to