At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote in > On 2021-Sep-02, Kyotaro Horiguchi wrote: > > > So, this is a crude PoC of that. > > I had ended up with something very similar, except I was trying to cram > the flag via the checkpoint record instead of hacking > AdvanceXLInsertBuffer(). I removed that stuff and merged both, here's > the result. > > > 1. This patch is written on the current master, but it doesn't > > interfare with the seg-boundary-memorize patch since it removes the > > calls to RegisterSegmentBoundary. > > I rebased on top of the revert patch.
Thanks! > > 2. Since xlogreader cannot emit a log-message immediately, we don't > > have a means to leave a log message to inform recovery met an > > aborted partial continuation record. (In this PoC, it is done by > > fprintf:p) > > Shrug. We can just use an #ifndef FRONTEND / elog(LOG). (I didn't keep > this part, sorry.) No problem, it was mere a develop-time message for behavior observation. > > 3. Myebe we need to pg_waldump to show partial continuation records, > > but I'm not sure how to realize that. > > Ah yes, we'll need to fix that. I just believe 0001 does the right thing. 0002: > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at > end of > + * > WAL */ The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work? > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) > { > report_invalid_record(state, > > "there is no contrecord flag at %X/%X", > > LSN_FORMAT_ARGS(RecPtr)); > - goto err; > + goto aborted_contrecord; This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and _IS_ABROTED_PARTIAL. Is it okay? (I don't object to remove the check.). I didin't thought this as an aborted contrecord. but on second thought, when we see a record broken in any style, we stop recovery at the point. I agree to the change and all the silmiar changes. + /* XXX should we goto aborted_contrecord here? */ I think it should be aborted_contrecord. When that happens, the loaded bytes actually looked like the first fragment of a continuation record to xlogreader, even if the cause were a broken total_len. So if we abort the record there, the next time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same page, and correctly finds a new record there. On the other hand if we just errored-out there, we will step-back to the beginning of the broken record in the previous page or segment then start writing a new record there but that is exactly what we want to avoid now. regards. -- Kyotaro Horiguchi NTT Open Source Software Center