On Tue, Nov 9, 2010 at 11:26 AM, Jim Nasby <j...@nasby.net> wrote: >> Huh, this implies that if we did go through all the work of >> segregating the hint bits and could arrange that they all appear on >> the same 512-byte sector and if we buffered them so that we were >> writing the same bits we checksummed then we actually *could* include >> them in the CRC after all since even a torn page will almost certainly >> not tear an individual sector. > > If there's a torn page then we've crashed, which means we go through crash > recovery, which puts a valid page (with valid CRC) back in place from the > WAL. What am I missing?
The problem case is where hint-bits have been set. Hint bits have always been "we don't really care, but we write them". A torn-page on hint-bit-only writes is ok, because with a torn page (assuming you dont' get zero-ed pages), you get the old or new chunks of the complete 8K buffer, but they are identical except for only hint-bits, which eiterh the old or new state is sufficient. But with a check-sum, now, getting a torn page w/ only hint-bit updates now becomes noticed. Before, it might have happened, but we wouldn't have noticed or cared. So, for getting checksums, we have to offer up a few things: 1) zero-copy writes, we need to buffer the write to get a consistent checksum (or lock the buffer tight) 2) saving hint-bits on an otherwise unchanged page. We either need to just not write that page, and loose the work the hint-bits did, or do a full-page WAL of it, so the torn-page checksum is fixed Both of these are theoretical performance tradeoffs. How badly do we want to verify on read that it is *exactly* what we thought we wrote? a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers