Hello!

One more thing - I think build_new_indexes and
index_concurrently_create_copy are very close in semantics, so it
might be a good idea to refactor them a bit.

I’m still concerned about MVCC-related issues. For multiple
applications, this is a dealbreaker, because in some cases correctness
is a higher priority than availability.

Possible options:

1) Terminate connections with old snapshots.

Add a flag to terminate all connections with snapshots during the
ExclusiveLock period for the swap. From the application’s perspective,
this is not a big deal - it's similar to a primary switch. We would
also need to prevent new snapshots from being taken during the swap
transaction, so a short exclusive lock on ProcArrayLock would also be
required.

2) MVCC-safe two-phase approach (inspired by CREATE INDEX).

- copy the data from T1 to the new table T2.
- apply the log.
- take a table-exclusive lock on T1
- apply the log again.
- instead of swapping, mark the T2 as a kind of shadow table - any
transaction applying changes to T1 must also apply them to T2, while
reads still use T1 as the source of truth.
- commit (and record the transaction ID as XID1).
- at this point, all changes are applied to both tables with the same
XIDs because of the "shadow table" mechanism.
- wait until older snapshots no longer treat XID1 as uncommitted.
- now the tables are identical from the MVCC perspective.
- take an exclusive lock on both T1 and T2.
- perform the swap and drop T1.
- commit.

This is more complex and would require implementing some sort of
"shadow table" mechanism, so it might not be worth the effort. Option
1 feels more appealing to me.

If others think this is a good idea, I might try implementing a proof
of concept.

Best regards,
Mikhail


Reply via email to