On Thu, Nov 26, 2020 at 10:43 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> I think what you need to do to reproduce this is to follow the > snapshot machinery in SnapBuildFindSnapshot. Basically, first, start a > transaction (say transaction-id is 500) and do some operations but > don't commit. Here, if you create a slot (via subscription or > otherwise), it will wait for 500 to complete and make the state as > SNAPBUILD_BUILDING_SNAPSHOT. Here, you can commit 500 and then having > debugger in that state, start another transaction (say 501), do some > operations but don't commit. Next time when you reach this function, > it will change the state to SNAPBUILD_FULL_SNAPSHOT and wait for 501, > now you can start another transaction (say 502) which you can prepare > but don't commit. Again start one more transaction 503, do some ops, > commit both 501 and 503. At this stage somehow we need to ensure that > XLOG_RUNNING_XACTS record. Then commit prepared 502. Now, I think you > should notice that the consistent point is reached after 502's prepare > and before its commit. Now, this is just a theoretical scenario, you > need something on these lines and probably a way to force > XLOG_RUNNING_XACTS WAL (probably via debugger or some other way) at > the right times to reproduce it. > > Thanks for trying to build a test case for this, it is really helpful. I tried the above steps, I was able to get the builder state to SNAPBUILD_BUILDING_SNAPSHOT but was not able to get into the SNAPBUILD_FULL_SNAPSHOT state. Instead the code moves straight from SNAPBUILD_BUILDING_SNAPSHOT to SNAPBUILD_CONSISTENT state. In the function SnapBuildFindSnapshot, either the following check fails: 1327: TransactionIdPrecedesOrEquals(SnapBuildNextPhaseAt(builder), running->oldestRunningXid)) because the SnapBuildNextPhaseAt (which is same as running->nextXid) is higher than oldestRunningXid, or when the both are the same, then it falls through into the below condition higher in the code 1247: if (running->oldestRunningXid == running->nextXid) and then the builder moves straight into the SNAPBUILD_CONSISTENT state. At no point will the nextXid be less than oldestRunningXid. In my sessions, I commit multiple txns, hoping to bump up oldestRunningXid, I do checkpoints, have made sure the XLOG_RUNNING_XACTS are being inserted., but while iterating into SnapBuildFindSnapshot with a ,new XLOG_RUNNING_XACTS:record, the oldestRunningXid is being incremented at one xid at a time, which will eventually make it catch up running->nextXid and reach a SNAPBUILD_CONSISTENT state without entering the SNAPBUILD_FULL_SNAPSHOT state. regards, Ajin Cherian Fujitsu Australia