Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

Robert Haas Wed, 12 Jul 2023 07:02:18 -0700

On Wed, Jul 12, 2023 at 4:26 AM Michael Paquier <mich...@paquier.xyz> wrote:
> Taking the test case of upthread, from what I can see, the ALTER TABLE
> .. REPLICA IDENTITY registers two relcache invalidations for pk_foo
> (via RegisterRelcacheInvalidation), which is the relcache entry whose
> stuff is messed up.  I would have expected a refresh of the cache of
> pk_foo to happen when doing the ALTER INDEX .. ATTACH PARTITION, but
> for some reason it does not happen when running the whole in a
> transaction block.  I cannot put my finger on what's wrong for the
> moment, but based on my current impressions the inval requests are
> correctly registered when switching the replica identity, but nothing
> about pk_foo is updated when attaching a partition to it in the last
> step where the invisible update happens :/


I'm not sure exactly what is happening here, but it looks to me like
ATExecReplicaIdentity() only takes ShareLock on the index and
nevertheless feels entitled to update the pg_index tuple, which is
pretty strange. We normally require AccessExclusiveLock to perform DDL
on an object, and in the limited exceptions that we have to that rule
- see AlterTableGetLockLevel - it's pretty much always a
self-exclusive lock. Otherwise, two backends might try to do the same
DDL operation at the same time, which would lead to low-level failures
trying to update the same tuple such as the one seen here.

But even if that doesn't happen or is prevented by some other
mechanism, there's still a synchronization problem. Suppose backend B1
modifies some state via a DDL operation on table T and then afterward
backend B2 wants to perform a non-DDL operation that depends on that
state. Well, B1 takes some lock on the relation, and B2 takes a lock
that would conflict with it, and that guarantees that B2 starts after
B1 commits. That means that B2 is guaranteed to see the invalidations
that were queued by B1, which means it will flush any state out of its
cache that was made stale by the operation performed by B1. If the
locks didn't conflict, B2 might start before B1 committed and either
fail to update its caches or update them but with a version of the
tuples that's about to be made obsolete when B1 commits. So ShareLock
doesn't feel like a very safe choice here.

But I'm not quite sure exactly what's going wrong, either. Every
update is going to call CacheInvalidateHeapTuple(), and updating
either an index's pg_class tuple or its pg_index tuple should queue up
a relcache invalidation, and CommandEndInvalidationMessages() should
cause that to be processed. If this were multiple transactions, the
only thing that would be different is that the invalidation messages
would be in the shared queue, so maybe there's something going on with
the timing of CommandEndInvalidationMessages() vs.
AcceptInvalidationMessages() that accounts for the problem occurring
in one case but not the other. But I do wonder whether the underlying
problem is that what ATExecReplicaIdentity() is doing is not really
safe.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index

Reply via email to