On Mon, Sep 16, 2019 at 5:27 AM Kuntal Ghosh <kuntalghosh.2...@gmail.com> wrote: > While testing zheap over undo apis, we've found the following > issues/scenarios that might need some fixes/discussions:
Thanks! > 1. In UndoLogAllocateInRecovery, when we find the current log number > from the list of registered blocks, we don't check whether the > block->in_use flag is true or not. In XLogResetInsertion, we just > reset in_use flag without reseting the blocks[]->rnode information. > So, if we don't check the in_use flag, it's possible that we'll > consult some block information from the previous WAL record. IMHO, > just adding an in_use check in UndoLogAllocateInRecovery will solve > the problem. Agreed. I added a line to break out of that loop if !block->in_use. BTW I am planning to simplify that code considerably, based on a plan to introduce a new rule: there can be only one undo record and therefore only one undo allocation per WAL record. > 2. A transaction, inserts one undo record and generated a WAL record > for the same, say at WAL location 0/2000A000. Next, the undo record > gets discarded and WAL is generated to update the meta.discard pointer > at location 0/2000B000 At the same time, an ongoing checkpoint with > checkpoint.redo at 0/20000000 flushes the latest meta.discard pointer. > Now, the system crashes. > Now, the recovery starts from the location 0/20000000. When the > recovery of 0/2000A000 happens, it sees the undo record that it's > about to insert, is already discarded as per meta.discard (flushed by > checkpoint). In this case, should we just skip inserting the undo > record? I see two options: 1. We make it so that if you're allocating in recovery and discard > insert, we'll just set discard = insert so you can proceed. The code in undofile_get_segment_file() already copes with missing files during recovery. 2. We skip the insert as you said. I think option 1 is probably best, otherwise you have to cope with failure to insert by skipping, as you said. > 3. Currently, we create a backup image of the unlogged part of the > undo log's metadata only when some backend allocates some space from > the undo log (in UndoLogAllocate). This helps us restore the unlogged > meta part after a checkpoint. > When we perform an undo action, we also update the undo action > progress and emit an WAL record. The same operation can performed by > the undo worker which doesn't allocate any space from the undo log. > So, if an undo worker emits an WAL record to update undo action > progress after a checkpoint, it'll not be able to WAL log the backup > image of the meta unlogged part. IMHO, this breaks the recovery logic > of unlogged part of undo meta. I thought that was OK because those undo data updates don't depend on the insert pointer. But I see what you mean: the next modification of the page that DOES depend on the insert pointer might not log the meta-data if it's not the first WAL record to touch it after a checkpoint. Rats. I'll have to think about that some more. -- Thomas Munro https://enterprisedb.com