On Sat, Dec 8, 2018 at 7:52 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, Dec 4, 2018 at 3:00 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Sat, Dec 1, 2018 at 12:58 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > > > > > > 13. > > > PrepareUndoInsert() > > > { > > > .. > > > if (!UndoRecPtrIsValid(prepared_urec_ptr)) > > > + urecptr = UndoRecordAllocate(urec, 1, upersistence, txid); > > > + else > > > + urecptr = prepared_urec_ptr; > > > + > > > + size = UndoRecordExpectedSize(urec); > > > .. > > > > > > I think we should make above code a bit more bulletproof. As it is > > > written, there is no guarantee the size we have allocated is same as > > > we are using in this function. > > I agree > > How about if we take 'size' as output > > > parameter from UndoRecordAllocate and then use it in this function? > > > Additionally, we can have an Assert that the size returned by > > > UndoRecordAllocate is same as UndoRecordExpectedSize. > > With this change we will be able to guarantee when we are allocating > > single undo record > > but multi prepare will still be a problem. I haven't fix this as of > > now. I will think on how > > to handle both the cases when we have to prepare one time or when we > > have to allocate > > once and prepare multiple time. > > > > Yeah, this appears tricky. I think we can leave it as it is unless we > get some better idea. > > 1. > +void > +UndoRecordOnUndoLogChange(UndoPersistence persistence) > +{ > + prev_txid[persistence] = InvalidTransactionId; > +} > > Are we using this API in this or any other patch or in zheap? I see > this can be useful API, but unless we have a plan to use it, I don't > think we should retain it. > Currently, we are not using it so removed. 2. > + * Handling multi log: > + * > + * It is possible that the undo record of a transaction can be spread > across > + * multiple undo log. And, we need some special handling while > inserting the > + * undo for discard and rollback to work sanely. > + * > + * If the undorecord goes to next log then we insert a transaction > header for > + * the first record in the new log and update the transaction header > with this > + * new log's location. This will allow us to connect transactions across > logs > + * when the same transaction span across log (for this we keep track of > the > + * previous logno in undo log meta) which is required to find the latest > undo > + * record pointer of the aborted transaction for executing the undo > actions > + * before discard. If the next log get processed first in that case we > + * don't need to trace back the actual start pointer of the transaction, > + * in such case we can only execute the undo actions from the current log > + * because the undo pointer in the slot will be rewound and that > will be enough > + * to avoid executing same actions. However, there is possibility that > after > + * executing the undo actions the undo pointer got discarded, now in > later > + * stage while processing the previous log it might try to fetch the undo > + * record in the discarded log while chasing the transaction header > chain. > + * To avoid this situation we first check if the next_urec of the > transaction > + * is already discarded then no need to access that and start executing > from > + * the last undo record in the current log. > > I think I see the problem in the discard mechanism when the log is > spread across multiple logs. Basically, if the second log contains > undo of some transaction prior to the transaction which has just > decided to spread it's undo in the chosen undo log, then we might > discard the undo log of some transaction(s) inadvertently. Am, I > missing something? Actually, I don't see exactly this problem here because we only process one undo log at a time, so we will not got to the next undo log and discard some transaction for which we supposed to retain the undo. > If not, then I guess we need to ensure that we > don't immediately discard the undo in the second log when a single > transactions undo is spreaded across two logs > > Before choosing a new undo log to span the undo for a transaction, do > we ensure that it is not already linked with some other undo log for a > similar reason? > > One more thing in this regard: > +UndoRecordAllocate(UnpackedUndoRecord *undorecords, int nrecords, > + TransactionId txid, UndoPersistence upersistence) > { > .. > .. > + if (InRecovery) > + urecptr = UndoLogAllocateInRecovery(txid, size, upersistence); > + else > + urecptr = UndoLogAllocate(size, upersistence); > + > + log = UndoLogGet(UndoRecPtrGetLogNo(urecptr), false); > + > + /* > + * By now, we must be attached to some undo log unless we are in recovery. > + */ > + Assert(AmAttachedToUndoLog(log) || InRecovery); > + > + /* > + * We can consider the log as switched if this is the first record of the > + * log and not the first record of the transaction i.e. same transaction > + * continued from the previous log. > + */ > + if ((UndoRecPtrGetOffset(urecptr) == UndoLogBlockHeaderSize) && > + log->meta.prevlogno != InvalidUndoLogNumber) > + log_switched = true; > .. > .. > } > > Isn't there a hidden assumption in the above code that you will always > get a fresh undo log if the undo doesn't fit in the currently attached > log? What is the guarantee of same? > Yeah it's a problem, we might get the undo which may not be empty. One way to avoid this could be that instead of relying on the check "UndoRecPtrGetOffset(urecptr) == UndoLogBlockHeaderSize". We can add some flag to the undo log meta data that whether it's the first record after attach or not and based on that we can decide. But, I want to think some better solution where we can identify without adding anything extra to undo meta. 3. > +/* > + * Call PrepareUndoInsert to tell the undo subsystem about the undo > record you > + * intended to insert. Upon return, the necessary undo buffers are > pinned and > + * locked. > + * This should be done before any critical section is established, since > it > + * can fail. > + * > + * If not in recovery, 'xid' should refer to the top transaction id > because > + * undo log only stores mapping for the top most transactions. > + * If in recovery, 'xid' refers to the transaction id stored in WAL. > + */ > +extern UndoRecPtr PrepareUndoInsert(UnpackedUndoRecord *, TransactionId > xid, > + UndoPersistence); > + > +/* > + * Insert a previously-prepared undo record. This will write the actual > undo > + * record into the buffers already pinned and locked in > PreparedUndoInsert, > + * and mark them dirty. For persistent undo, this step should be > performed > + * after entering a critical section; it should never fail. > + */ > +extern void InsertPreparedUndo(void); > > This text is duplicate of what is mentioned in .c file, so I have > removed it in delta patch. Similarly, I have removed duplicate text > atop other functions exposed via undorecord.h > > 4. > +/* > + * Forget about any previously-prepared undo record. Error recovery calls > + * this, but it can also be used by other code that changes its mind about > + * inserting undo after having prepared a record for insertion. > + */ > +extern void CancelPreparedUndo(void); > > This API is nowhere defined or used. What is the intention? > Not required > 5. > +typedef struct UndoRecordHeader > +{ > .. > + /* > + * Transaction id that has modified the tuple present in this undo record. > + * If this is older then RecentGlobalXmin, then we can consider the tuple > + * in this undo record as visible. > + */ > + TransactionId urec_prevxid; > .. > > /then/than > Done > > I think we need to mention oldestXidWithEpochHavingUndo instead of > RecentGlobalXmin. > Merged > > 6. > + * If UREC_INFO_BLOCK is set, an UndoRecordBlock structure follows. > + * > + * If UREC_INFO_PAYLOAD is set, an UndoRecordPayload structure follows. > + * > + * When (as will often be the case) multiple structures are present, they > + * appear in the same order in which the constants are defined here. > That is, > + * UndoRecordRelationDetails appears first. > + */ > +#define UREC_INFO_RELATION_DETAILS 0x01 > > I have expanded this comments in the attached delta patch. Merged > I think we > should remove the define UREC_INFO_PAYLOAD_CONTAINS_SLOT from the > patch as this is zheap specific and should be added later along with > the zheap code. > Yeah we can, so removed in my new patch. 7. > +/* > + * Additional information about a relation to which this record pertains, > + * namely the tablespace OID and fork number. If the tablespace OID is > + * DEFAULTTABLESPACE_OID and the fork number is MAIN_FORKNUM, this > structure > + * can (and should) be omitted. > + */ > +typedef struct UndoRecordRelationDetails > +{ > + ForkNumber urec_fork; /* fork number */ > +} UndoRecordRelationDetails; > > This comment seems to be out-dated, so modified in the attached delta > patch. > Merged > > 8. > +typedef struct UndoRecordTransaction > +{ > + uint32 urec_progress; /* undo applying progress. */ > + uint32 urec_xidepoch; /* epoch of the current transaction */ > > Can you expand comments about how the progress is defined and used? > Moved your comment from UnpackedUndoRecord to this structure and in UnpackedUndoRecord I have mentioned that we can refer detailed comment in this structure. > Also, write a few sentences about why epoch is captured and or used? urec_xidepoch is captured mainly for the zheap visibility purpose so isn't it good that we mention it there? > 9. > +#define urec_next_pos \ > + (SizeOfUndoRecordTransaction - SizeOfUrecNext) > > What is its purpose? > It's not required so removed > > 10. > +/* > + * Set uur_info for an UnpackedUndoRecord appropriately based on which > + * other fields are set. > + */ > +extern void UndoRecordSetInfo(UnpackedUndoRecord *uur); > + > +/* > + * Compute the number of bytes of storage that will be required to insert > + * an undo record. Sets uur->uur_info as a side effect. > + */ > +extern Size UndoRecordExpectedSize(UnpackedUndoRecord *uur); > > Again, I see duplicate text in .h and .c files, so removed this and > similar comments from .h files. I have tried to move some part of > comments from .h to .c file, so that it is easier to read from one > place rather than referring at two places. See, if I have missed > anything. > > Apart from the above, I have done few other cosmetic changes in the > attached delta patch, see, if you like those, kindly include it in > the main patch. Done
0003-undo-interface-v11.patch
Description: Binary data