On Mon, Feb 26, 2018 at 05:08:49PM +0900, Michael Paquier wrote: > This was mentioned back in 2001 by the way, but this did not count much > for the case discussed here: > https://www.postgresql.org/message-id/24901.995381770%40sss.pgh.pa.us > The issue here is that the streaming case makes it easier to hit the > problem as it opens more easily access to not-completely written WAL > pages depending on the message frequency during replication. At the > same time, we are discussing about a very low-probability issue. Note > that if the XLOG reader is bumping into this problem, then at the next > WAL receiver wake up, recovery would begin from the beginning of the > last segment, and if the primary has produced some more WAL then the > standby would be able to actually avoid the random junk. It is also > possible to bypass the problem by zeroing manually the areas in > question, or to actually wait for the standby to generate more WAL so as > the garbage is overwritten automatically. And you really need to be > very, very unlucky to have random garbage able to bypass the header > validation checks.
By the way, as long as I have my mind of it. Another strategy would be to just make the checks in XLogReadRecord() a bit smarter if the whole record header is not on the page. If we check at least for AllocSizeIsValid(total_len) then there this code would not fail on an allocation as you user reported. Still this misses the case where a record size is lower than 1GB but invalid so you would allocate allocate_recordbuf for nothing :( At least this extra check is costless, and avoids all kind of hard failures. -- Michael
signature.asc
Description: PGP signature