>> In the WAL we just need to be able to detect torn pages and stop >> reading WAL at that point. That's easier and doesn't really need a >> CRC. We could just adopt the Sybase strategy of storing a unique id >> number every 512 bytes throughout the WAL page. If those numbers don't >> match then we have a torn page; the system crashed at that point and we should stop reading WAL pages.
> I've looked into this in more depth following your > suggestion: I think it seems straightforward to move the > xl_prev field from being a header to a trailer. That way when > we do the test on the back pointer we will be assured that > there is no torn page effecting the remainder of the xlrec. > That would make it safer with wal_checksum = off. I do not think we can assume any order of when a block is written to disk. I think all this can only be used on OS and hardware, that can guarantee that what is written by one IO call (typically 8k) from pg is safe. Those combinations do exist, so I think we want the switch. Putting xl_prev to the end helps only for direct IO WAL sync methods, else we would need it on every page. Andreas ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match