At Wed, 9 Aug 2023 15:03:21 +0900, Michael Paquier <mich...@paquier.xyz> wrote in > I have spent more time on 0001, polishing it and fixing a few bugs > that I have found while reviewing the whole. Most of them were > related to mistakes in resetting the error state when expected. I > have also expanded DecodeXLogRecord() to use an error structure > instead of only an errmsg, giving more consistency. The error state > now relies on two structures: > +typedef enum XLogReaderErrorCode > +{ > + XLOG_READER_NONE = 0, > + XLOG_READER_OOM, /* out-of-memory */ > + XLOG_READER_INVALID_DATA, /* record data */ > +} XLogReaderErrorCode; > +typedef struct XLogReaderError > +{ > + /* Buffer to hold error message */ > + char *message; > + bool message_deferred; > + /* Error code when filling *message */ > + XLogReaderErrorCode code; > +} XLogReaderError; > > I'm kind of happy with this layer, now.
I'm not certain if message_deferred is a property of the error struct. Callers don't seem to need that information. The name "XLOG_RADER_NONE" seems too generic. XLOG_READER_NOERROR will be clearer. > I have also spent some time on finding a more elegant solution for the > WAL replay, relying on the new facility from 0001. And it happens > that it is easy enough to loop if facing an out-of-memory failure when > reading a record when we are in crash recovery, as the state is > actually close to what a standby does. The trick is that we should > not change the state and avoid tracking a continuation record. This > is done in 0002, making replay more robust. With the addition of the 0002 shifts the behavior for the OOM case from ending recovery to retrying at the same record. If the last record is really corrupted, the server won't be able to finish recovery. I doubt we are good with this behavior change. > error injection tweak in 0003, I am able to finish recovery while the > startup process loops if under memory pressure. As mentioned > previously, there are more code paths to consider, but that's a start > to fix the data loss problems. (The file name "xlogreader_oom" is a bit trickeier to type than "hoge" or "foo"X( ) > Comments are welcome. regards. -- Kyotaro Horiguchi NTT Open Source Software Center