On Wed, Nov 29, 2023 at 2:59 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > I have been hacking on improving the improvements outlined in my > preceding e-mail, but I have some bad news - I ran into an issue that I > don't know how to solve :-( > > Consider this transaction: > > BEGIN; > ALTER SEQUENCE s RESTART 1000; > > SAVEPOINT s1; > ALTER SEQUENCE s RESTART 2000; > ROLLBACK TO s1; > > INSERT INTO seq_test SELECT nextval('s') FROM generate_series(1,40); > COMMIT; > > If you try this with the approach relying on rd_newRelfilelocatorSubid > and rd_createSubid, it fails like this on the subscriber: > > ERROR: could not map filenode "base/5/16394" to relation OID > > This happens because ReorderBufferQueueSequence tries to do this in the > non-transactional branch: > > reloid = RelidByRelfilenumber(rlocator.spcOid, rlocator.relNumber); > > and the relfilenode is the one created by the first ALTER. But this is > obviously wrong - the changes should have been treated as transactional, > because they are tied to the first ALTER. So how did we get there? > > Well, the whole problem is that in case of abort, AtEOSubXact_cleanup > resets the two fields to InvalidSubTransactionId. Which means the > rollback in the above transaction also forgets about the first ALTER. > Now that I look at the RelationData comments, it actually describes > exactly this situation: > > * > * rd_newRelfilelocatorSubid is the ID of the highest subtransaction > * the most-recent relfilenumber change has survived into or zero if > * not changed in the current transaction (or we have forgotten > * changing it). This field is accurate when non-zero, but it can be > * zero when a relation has multiple new relfilenumbers within a > * single transaction, with one of them occurring in a subsequently > * aborted subtransaction, e.g. > * BEGIN; > * TRUNCATE t; > * SAVEPOINT save; > * TRUNCATE t; > * ROLLBACK TO save; > * -- rd_newRelfilelocatorSubid is now forgotten > * > > The root of this problem is that we'd need some sort of "history" for > the field, so that when a subxact aborts, we can restore the previous > value. But we obviously don't have that, and I doubt we want to add that > to relcache - for example, it'd either need to impose some limit on the > history (and thus a failure when we reach the limit), or it'd need to > handle histories of arbitrary length. >
Yeah, I think that would be really tricky and we may not want to go there. > At this point I don't see a solution for this, which means the best way > forward with the sequence decoding patch seems to be the original > approach, on the decoding side. > One thing that worries me about that approach is that it can suck with the workload that has a lot of DDLs that create XLOG_SMGR_CREATE records. We have previously fixed some such workloads in logical decoding where decoding a transaction containing truncation of a table with a lot of partitions (1000 or more) used to take a very long time. Don't we face performance issues in such scenarios? How do we see this work w.r.t to some sort of global sequences? There is some recent discussion where I have raised a similar point [1]. [1] - https://www.postgresql.org/message-id/CAA4eK1JF%3D4_Eoq7FFjHSe98-_ooJ5QWd0s2_pj8gR%2B_dvwKxvA%40mail.gmail.com -- With Regards, Amit Kapila.