On Tue, Aug 01, 2023 at 04:39:54PM -0700, Jeff Davis wrote: > On Tue, 2023-08-01 at 16:14 +0300, Aleksander Alekseev wrote: > > Probably I'm missing something, but if memory allocation is required > > during WAL replay and it fails, wouldn't it be a better solution to > > log the error and terminate the DBMS immediately? > > We need to differentiate between: > > 1. No valid record exists and it must be the end of WAL; LOG and start > up. > > 2. A valid record exists and we are unable to process it (e.g. due to > OOM); PANIC.
Yes, still there is a bit more to it. The origin of the introduction to palloc(MCXT_ALLOC_NO_OOM) partially comes from this thread, that has reported a problem where we switched from malloc() to palloc() when xlogreader.c got introduced: https://www.postgresql.org/message-id/CAHGQGwE46cJC4rJGv+kVMV8g6BxHm9dBR_7_QdPjvJUqdt7m=q...@mail.gmail.com And the malloc() behavior when replaying WAL records is even older than that. At the end, we want to be able to give more options to anybody looking at WAL records, and let them take decisions based on the error reached and the state of the system. For example, it does not make much sense to fail hard on OOM if replaying records when in standby mode because we can just loop again. The same can actually be said when in crash recovery. On OOM, the startup process considers that we have an invalid record now, which is incorrect. We could fail hard and FATAL to replay again (sounds like the natural option), or we could loop over the record that failed its allocation, repeating things. In any case, we need to give more information back to the system so as it can take better decisions on what it should do. -- Michael
signature.asc
Description: PGP signature