At Tue, 1 Aug 2023 12:43:21 +0900, Michael Paquier <mich...@paquier.xyz> wrote in > A colleague, Ethan Mertz (in CC), has discovered that we don't handle > correctly WAL records that are failing because of an OOM when > allocating their required space. In the case of Ethan, we have bumped > on the failure after an allocation failure on XLogReadRecordAlloc(): > "out of memory while trying to decode a record of length"
I believe a database server is not supposed to be executed under such a memory-constrained environment. > In crash recovery, any records after the OOM would not be replayed. > At quick glance, it seems to me that this can also impact standbys, > where recovery could stop earlier than it should once a consistent > point has been reached. Actually the code is assuming that OOM happens solely due to a broken record length field. I believe that we intentionally put that assumption. > A patch is registered in the commit fest to improve the error > detection handling, but as far as I can see it fails to handle the OOM > case and replaces ReadRecord() to use a WARNING in the redo loop: > https://www.postgresql.org/message-id/20200228.160100.2210969269596489579.horikyota.ntt%40gmail.com It doesn't change behavior unrelated to the case where the last record is followed by zeroed trailing bytes. > On top of my mind, any solution I can think of needs to add more > information to XLogReaderState, where we'd either track the type of > error that happened close to errormsg_buf which is where these errors > are tracked, but any of that cannot be backpatched, unfortunately. One issue on changing that behavior is that there's not a simple way to detect a broken record before loading it into memory. We might be able to implement a fallback mechanism for example that loads the record into an already-allocated buffer (which is smaller than the specified length) just to verify if it's corrupted. However, I question whether it's worth the additional complexity. And I'm not sure what if the first allocation failed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center