https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote: > Separable, nontrivial things not fixed in the attached patch stack: > > - Inplace update uses transactional CacheInvalidateHeapTuple(). ROLLBACK of > CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss > still seen in inplace-inval.spec. CacheInvalidateRelmap() does this right.
I plan to fix that like CacheInvalidateRelmap(): send the inval immediately, inside the critical section. Send it in heap_xlog_inplace(), too. The interesting decision is how to handle RelationCacheInitFilePreInvalidate(), which has an unlink_initfile() that can fail with e.g. EIO. Options: 1. Unlink during critical section, and accept that EIO becomes PANIC. Replay may reach the same EIO, and the system won't reopen to connections until the storage starts cooperating.a Interaction with checkpoints is not ideal. If we checkpoint and then crash between inplace XLogInsert() and inval, we'd be relying on StartupXLOG() -> RelationCacheInitFileRemove(). That uses elevel==LOG, so replay would neglect to PANIC on EIO. 2. Unlink before critical section, so normal xact abort suffices. This would hold RelCacheInitLock and a buffer content lock at the same time. In RecordTransactionCommit(), it would hold RelCacheInitLock and e.g. slru locks at the same time. The PANIC risk of (1) seems similar to the risk of PANIC at RecordTransactionCommit() -> XLogFlush(), which hasn't been a problem. The checkpoint-related risk bothers me more, and (1) generally makes it harder to reason about checkpoint interactions. The lock order risk of (2) feels tolerable. I'm leaning toward (2), but that might change. Other preferences? Another decision is what to do about LogLogicalInvalidations(). Currently, inplace update invalidations do reach WAL via LogLogicalInvalidations() at the next CCI. Options: a. Within logical decoding, cease processing invalidations for inplace updates. Inplace updates don't affect storage or interpretation of table rows, so they don't affect logicalrep_write_tuple() outcomes. If they did, invalidations wouldn't make it work. Decoding has no way to retrieve a snapshot-appropriate version of the inplace-updated value. b. Make heap_decode() of XLOG_HEAP_INPLACE recreate the invalidation. This would be, essentially, cheap insurance against invalidations having a benefit I missed in (a). I plan to pick (a). > - AtEOXact_Inval(true) is outside the RecordTransactionCommit() critical > section, but it is critical. We must not commit transactional DDL without > other backends receiving an inval. (When the inplace inval becomes > nontransactional, it will face the same threat.) This faces the same RelationCacheInitFilePreInvalidate() decision, and I think the conclusion should be the same as for inplace update. Thanks, nm