On 2021-Sep-03, Kyotaro Horiguchi wrote: > At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera <alvhe...@alvh.no-ip.org> > wrote in
> 0002: > > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at > > end of > > + * > > WAL */ > > The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work? I went over various iterations of the name of this, and still not entirely happy. I think we need to convey the ideas that * This is the endptr+1 of the known-good part of the record, that is, the beginning of the next part of the record. I think "endPtr" summarizes this well; we use this name elsewhere. * At some point before recovery, this was the last WAL record that existed * there is an invalid contrecord, or we were looking for a contrecord and found invalid data * this record is incomplete So maybe 1. incompleteRecEndPtr 2. finalInvalidRecEndPtr 3. brokenContrecordEndPtr > > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) > > { > > report_invalid_record(state, > > > > "there is no contrecord flag at %X/%X", > > > > LSN_FORMAT_ARGS(RecPtr)); > > - goto err; > > + goto aborted_contrecord; > > This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and > _IS_ABROTED_PARTIAL. Is it okay? (I don't object to remove the check.). Yeah, I was unsure about that. I think it's good to have it as a cross-check, though it should never occur. I'll put it back. Another related point is whether it's a good idea to have the ereport() about the bit appearing in a not-start-of-page address being a PANIC. If we degrade to WARNING then it'll be lost in the noise, but I'm not sure what else can we do. (If it's a PANIC, then you end up with an unusable database). > I didin't thought this as an aborted contrecord. but on second > thought, when we see a record broken in any style, we stop recovery at > the point. I agree to the change and all the silmiar changes. > > + /* XXX should we goto > aborted_contrecord here? */ > > I think it should be aborted_contrecord. > > When that happens, the loaded bytes actually looked like the first > fragment of a continuation record to xlogreader, even if the cause > were a broken total_len. So if we abort the record there, the next > time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same > page, and correctly finds a new record there. > > On the other hand if we just errored-out there, we will step-back to > the beginning of the broken record in the previous page or segment > then start writing a new record there but that is exactly what we want > to avoid now. Hmm, yeah, makes sense. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/