Hello Thomas, On Mon, Sep 16, 2019 at 11:23 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > > 1. In UndoLogAllocateInRecovery, when we find the current log number > > from the list of registered blocks, we don't check whether the > > block->in_use flag is true or not. In XLogResetInsertion, we just > > reset in_use flag without reseting the blocks[]->rnode information. > > So, if we don't check the in_use flag, it's possible that we'll > > consult some block information from the previous WAL record. IMHO, > > just adding an in_use check in UndoLogAllocateInRecovery will solve > > the problem. > > Agreed. I added a line to break out of that loop if !block->in_use. > I think we should skip the block if !block->in_use. Because, the undo buffer can be registered in a subsequent block as well. For different operations, we can use different block_id to register the undo buffer in the redo record.
> BTW I am planning to simplify that code considerably, based on a plan > to introduce a new rule: there can be only one undo record and > therefore only one undo allocation per WAL record. > Okay. In that case, we need to rethink the cases for multi-inserts and non-inlace updates both of which currently inserts multiple undo record corresponding to a single WAL record. For multi-inserts, it can be solved easily by moving all the offset information in the payload. But, for non-inlace updates, we insert one undo record for the update and one for the insert. Wondering whether we've to insert two WAL records - one for update and one for the new insert. > > 2. A transaction, inserts one undo record and generated a WAL record > > for the same, say at WAL location 0/2000A000. Next, the undo record > > gets discarded and WAL is generated to update the meta.discard pointer > > at location 0/2000B000 At the same time, an ongoing checkpoint with > > checkpoint.redo at 0/20000000 flushes the latest meta.discard pointer. > > Now, the system crashes. > > Now, the recovery starts from the location 0/20000000. When the > > recovery of 0/2000A000 happens, it sees the undo record that it's > > about to insert, is already discarded as per meta.discard (flushed by > > checkpoint). In this case, should we just skip inserting the undo > > record? > > I see two options: > > 1. We make it so that if you're allocating in recovery and discard > > insert, we'll just set discard = insert so you can proceed. The code > in undofile_get_segment_file() already copes with missing files during > recovery. > Interesting. This should work. > > > 3. Currently, we create a backup image of the unlogged part of the > > undo log's metadata only when some backend allocates some space from > > the undo log (in UndoLogAllocate). This helps us restore the unlogged > > meta part after a checkpoint. > > When we perform an undo action, we also update the undo action > > progress and emit an WAL record. The same operation can performed by > > the undo worker which doesn't allocate any space from the undo log. > > So, if an undo worker emits an WAL record to update undo action > > progress after a checkpoint, it'll not be able to WAL log the backup > > image of the meta unlogged part. IMHO, this breaks the recovery logic > > of unlogged part of undo meta. > > I thought that was OK because those undo data updates don't depend on > the insert pointer. But I see what you mean: the next modification of > the page that DOES depend on the insert pointer might not log the > meta-data if it's not the first WAL record to touch it after a > checkpoint. Rats. I'll have to think about that some more. Cool. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com