On Thu, Sep 9, 2021 at 8:33 PM Antonin Houska <a...@cybertec.at> wrote: > > The cfbot complained that the patch series no longer applies, so I've rebased > it and also tried to make sure that the other flags become green. > > One particular problem was that pg_upgrade complained that "live undo data" > remains in the old cluster. I found out that the temporary undo log causes the > problem, so I've adjusted the query in check_for_undo_data() accordingly until > the problem gets fixed properly. > > The problem of the temporary undo log is that it's loaded into local buffers > and that backend can exit w/o flushing local buffers to disk, and thus we are > not guaranteed to find enough information when trying to discard the undo log > the backend wrote. I'm thinking about the following solutions: > > 1. Let the backend manage temporary undo log on its own (even the slot > metadata would stay outside the shared memory, and in particular the > insertion pointer could start from 1 for each session) and remove the > segment files at the same moment the temporary relations are removed. > > However, by moving the temporary undo slots away from the shared memory, > computation of oldestFullXidHavingUndo (see the PROC_HDR structure) would > be affected. It might seem that a transaction which only writes undo log > for temporary relations does not need to affect oldestFullXidHavingUndo, > but it needs to be analyzed thoroughly. Since oldestFullXidHavingUndo > prevents transactions to be truncated from the CLOG too early, I wonder if > the following is possible (This scenario is only applicable to the zheap > storage engine [1], which is not included in this patch, but should already > be considered.): > > A transaction creates a temporary table, does some (many) changes and then > gets rolled back. The undo records are being applied and it takes some > time. Since XID of the transaction did not affect oldestFullXidHavingUndo, > the XID can disappear from the CLOG due to truncation. >
By above do you mean to say that in zheap code, we don't consider XIDs that operate on temp table/undo for oldestFullXidHavingUndo? > However zundo.c in > [1] indicates that the transaction status *is* checked during undo > execution, so we might have a problem. > It would be easier to follow if you can tell which exact code are you referring here? -- With Regards, Amit Kapila.