On 2021-Sep-03, Kyotaro Horiguchi wrote:

> At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera <alvhe...@alvh.no-ip.org> 
> wrote in 

> 0002:
> > +   XLogRecPtr      abortedContrecordPtr; /* LSN of incomplete record at 
> > end of
> > +                                                                      * 
> > WAL */
> 
> The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?

I went over various iterations of the name of this, and still not
entirely happy.  I think we need to convey the ideas that

* This is the endptr+1 of the known-good part of the record, that is,
  the beginning of the next part of the record.  I think "endPtr"
  summarizes this well; we use this name elsewhere.

* At some point before recovery, this was the last WAL record that
  existed

* there is an invalid contrecord, or we were looking for a contrecord
  and found invalid data

* this record is incomplete

So maybe
1. incompleteRecEndPtr
2. finalInvalidRecEndPtr
3. brokenContrecordEndPtr

> >                     if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
> >                     {
> >                             report_invalid_record(state,
> >                                                                       
> > "there is no contrecord flag at %X/%X",
> >                                                                       
> > LSN_FORMAT_ARGS(RecPtr));
> > -                           goto err;
> > +                           goto aborted_contrecord;
> 
> This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and
> _IS_ABROTED_PARTIAL.  Is it okay?  (I don't object to remove the check.).

Yeah, I was unsure about that.  I think it's good to have it as a
cross-check, though it should never occur.  I'll put it back.

Another related point is whether it's a good idea to have the ereport()
about the bit appearing in a not-start-of-page address being a PANIC.
If we degrade to WARNING then it'll be lost in the noise, but I'm not
sure what else can we do.  (If it's a PANIC, then you end up with an
unusable database).

> I didin't thought this as an aborted contrecord. but on second
> thought, when we see a record broken in any style, we stop recovery at
> the point. I agree to the change and all the silmiar changes.
> 
> +                                     /* XXX should we goto 
> aborted_contrecord here? */
> 
> I think it should be aborted_contrecord.
> 
> When that happens, the loaded bytes actually looked like the first
> fragment of a continuation record to xlogreader, even if the cause
> were a broken total_len.  So if we abort the record there, the next
> time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same
> page, and correctly finds a new record there.
> 
> On the other hand if we just errored-out there, we will step-back to
> the beginning of the broken record in the previous page or segment
> then start writing a new record there but that is exactly what we want
> to avoid now.

Hmm, yeah, makes sense.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/


Reply via email to