On Wed, Jan 29, 2025 at 03:00:54PM +0300, Vitaly Davydov wrote: > It seems, there are much deeper problems with twophase transactions > as I thought. I'm interested in fixing twophase transactions, > because I support a solution which actively uses twophase > transactions. I'm interested to get more deeply into the twophase > functionality. Below, I just want to clarify some ideas behind > twophase transactions. I appreciate, if you comment my point of > view, or just ignore this email if you find it too naive and boring.
(Please be careful about your email indentation. These are hard to read.) > Two phase files are created after checkpoint to keep twophase states > on disk after WAL truncation. For transactions, that are inside the > checkpoint horizon, we do not create two phase files because the > states are currently stored in the WAL. Yeah, the oldest XID horizon stored in the checkpoint records would be stuck if you have a long-running 2PC transaction. It's actually something I have been brewing a bit for the last few days: we should discard early at recovery files that are older than the horizon in the checkpoint record we've retrieved at the beginning of recovery. So the early check can be tighter with past files. > Based on the thesis above, I guess, we have to read only those > twophase files which are related to the transactions before the > latest checkpoint. Its full xid should be lesser than > TransamVariables->nextXid (which is the same as > ControlFile->checkPointCopy.nextXid at the moment of StartupXLOG -> > restoreTwoPhaseData call). The files with greater (or equal) full > xids, should be ignored and removed. That's all what we need in > restoreTwoPhaseData, I believe. Yeah. It is important to do that only in restoreTwoPhaseData(). The check about already-aborted and already-committed 2PC files is something we have to keep as well. We should only do it at the end of recovery. > In the current implementation, such check is applied in > restoreTwoPhaseData -> ProcessTwoPhaseBuffer but after checking for > xid in CLOG. I'm not sure, why we check CLOG here. Once we truncate > the WAL on checkpoint and save twophase states into pg_twophase, > these files must store states of real transactions from past. Based on my analysis, the CLOG check ought to happen only in RecoverPreparedTransactions(), which is the last step taken at recovery. The future and past checks only need to happen in restoreTwoPhaseData(). I'm been bumping my head on my desk on this area for a few days now, but I think that the solution is simple: these checks should be moved *outside* ProcessTwoPhaseBuffer() and applied one layer higher in the two places I've mentioning above, where they matter. > There is another question about the loading order of twophase files > but I think it doesn't matter in which order we load these > files. But I would prefer to load it in full xid ascending order. I don't see a need in doing that in the short-term, but perhaps you have a point with the ordering of the entries in the TwoPhaseState shmem data if there are many entries to handle. Another idea would be to switch to a hash table. Note that I've actually found what looks like a potential data corruption bug in the logic while doing this investigation and adding TAP tests to automate all that: if we detect a 2PC file as already aborted or committed in ProcessTwoPhaseBuffer() while scanning TwoPhaseState->numPrepXacts, we could finish by calling PrepareRedoRemove(), which itself *manipulates* TwoPhaseState, decrementing numPrepXacts. This can leave dangling entries in TwoPhaseState if you have multiple TwoPhaseState entries (my implemented TAP tests have been doing exactly that). This is relevant at the end of recovery where CLOG lookups are OK to do. So the situation is a bit of a mess in all the branches, a bit worse in 17~ due to the fact that epochs are poorly integrated, but I'm getting there with a set of patches mostly ready to be sent, and I think that this would be OK for a backpatch. My plan is to start a new thread to deal with the matter, because that's a bit larger than the topic you have raised on this thread. I'll add you and Noah in CC for awareness. -- Michael
signature.asc
Description: PGP signature