"Simon Riggs" <[EMAIL PROTECTED]> writes: > I've looked into this in more depth following your suggestion: I think > it seems straightforward to move the xl_prev field from being a header > to a trailer. That way when we do the test on the back pointer we will > be assured that there is no torn page effecting the remainder of the > xlrec. That would make it safer with wal_checksum = off.
Hm. I think in practice this may actually help reduce the exposure to torn pages. However in theory there's no particular reason to think the blocks will be written out in physical order. The kernel may sync its buffers in some order dictated by its in-memory data structure and may end up coming across the second half of the 8kb page before the first half. It may even lie earlier on disk than the first half if the filesystem started a new extent at that point. If they were 4kb pages there would be fewer ways it could be written out of order, but even then the hard drive could find a bad block and remap it. I'm not sure what level of granularity drives remap at, it may be less than 4kb. To eliminate the need for the CRC in the WAL for everyone and still be safe from torn pages I think you have to have something like xl_prev repeated every 512b throughout the page. But if this is only an option for systems that don't expect to suffer from torn pages then sure, putting it in a footer seems like a good way to reduce the exposure somewhat. Putting it in both a header *and* a footer might be even better. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster