On Wed, Dec 25, 2024 at 07:18:18PM +0300, Vitaly Davydov wrote: > I tried your patch and it seems the server is started > successfully. But I've found another problem in my synthetic test - > it can not remove the file with the following message: > > LOG: database system was not properly shut down; automatic recovery in > progress > WARNING: removing future two-phase state file for transaction 1045224 > WARNING: could not remove file "pg_twophase/FFFFFFFF000FF2E8": No such file > or directory > LOG: redo starts at 0/1762978
That works properly up to v16~. I have slept over this problem and I think that we'd better backpatch the fix for the first problem. Now, I also want to add a test to provide coverage. As you say, that's simple: - Create an empty file with a future XID name - Restart the server. - Scan the logs for the WARNING where the future file is removed. > The fill will never be removed automatically. > I guess, it is because we incorrectly calculate the two-phase file > name using TwoPhaseFilePath in RemoveTwoPhaseFile in this > scenario. It can be fixed if to pass file path directly from > RecoverPreparedTransactions or StandbyRecoverPreparedTransaction > into ProcessTwoPhaseBuffer -> RemoveTwoPhaseFile. Yeah, I've noticed that as well. We are dealing with a second bug that affects 17~ caused by the switch to FullTransactionIds for 2PC file names, which had been introduced in 5a1dfde8334b (pretty sure that it is the culprit, did not bisect). Attempting to remove a future file triggers an assertion failure: #3 0x00007fed046324f0 in __GI_abort () at ./stdlib/abort.c:79 #4 0x0000563e68a9bd07 in ExceptionalCondition (conditionName=0x563e68c66fc3 "epoch > 0", fileName=0x563e68c66b0b "twophase.c", lineNumber=953) at assert.c:66 #5 0x0000563e67683060 in AdjustToFullTransactionId (xid=4095) at twophase.c:953 #6 0x0000563e67683092 in TwoPhaseFilePath (path=0x7ffcb515b4a0 "h\307!\227>V", xid=4095) at twophase.c:963 #7 0x0000563e67686603 in RemoveTwoPhaseFile (xid=4095, giveWarning=true) at twophase.c:1728 #8 0x0000563e67688989 in ProcessTwoPhaseBuffer (xid=4095, prepare_start_lsn=0, fromdisk=true, setParent=false, setNextXid=false) at twophase.c:2219 #9 0x0000563e676872a6 in restoreTwoPhaseData () at twophase.c:1924 #10 0x0000563e676b5c8b in StartupXLOG () at xlog.c:5642 So this means that we are dealing with two different bugs, and that we need to fix the assertion failure of the second problem first down to 17, then fix the first problem on all stable branches with a test to cover both the first and second problems. > I did it in the proposed patch > https://www.postgresql.org/message-id/cedbe-65e0c000-1-6db17700%40133269862 > (it is incomplete now). I suspect that this can be still dangerous as-is while complicating the code with more possible paths for the removal of the 2PC files, because we may still build a file name from an XID, and that's what's causing this issue.. -- Michael
signature.asc
Description: PGP signature