Re: [HACKERS] Block-level CRC checks

Greg Stark Mon, 17 Nov 2008 00:53:17 -0800

[sorry for top-posting - damn phone]

I thought of saying that too but it doesn't really solve the problem.Think of what happens if someone sets a hint bit on a dirty page.


greg

On 17 Nov 2008, at 08:26 AM, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:

Martijn van Oosterhout wrote:
On Fri, Nov 14, 2008 at 10:51:57AM -0500, Tom Lane wrote:
In fact, if the patch were to break torn-page handling, it would be
100% likely to be a net *decrease* in system reliability. Itwould add
detection of a situation that is not supposed to happen (ie, storage
system fails to return the same data it stored) at the cost ofbreaking
one's database when the storage system acts as it's expected and
documented to in a routine power-loss situation.
Ok, I see it's a problem because the hint changes are not WAL logged,
so torn pages are expected to work in normal operation. But simply
skipping the hint bits during checksumming is a terrible solution,
since then any errors in those bits will go undetected. To not beable
to say in the documentation that you'll detect 100% of single-bit
errors is pretty darn terrible, since that's kind of the goal of the
exercise.
Agreed, trying to explain that in the documentation would look likemaking excuses.
The requirement that all hint bit changes are WAL-logged seems likea pretty big change. I don't like doing that, just for CRCing.
There has been discussion before about not writing out pages to diskthat only have hint-bit updates on them. That means that the nexttime the page is read, the reader needs to do the clog lookups andset the hint bits again. It's a tradeoff, making the first SELECTafter modifying a page cheaper, I/O-wise, at the cost of making allsubsequent SELECTs that need to read the page from disk or kernelcache more expensive, CPU-wise.
I'm not sure if I like that idea or not, but it would also solve theCRC problem with torn pages. FWIW, it would also solve the problemsuggested with IBM DTLA disks and others that might zero-out asector in case of an interrupted write. I'm not totally convincedthat's a problem, as there's apparently other software that make thesame assumption as we do, and we haven't heard of any torn-pagecorruption in real life, but still.
If we made the behavior configurable, that would be pretty hard toexplain in the docs. We'd have three options with dependencies
- CRC on/off
- write pages with only hint bit changes on/off
- full_page_writes on/off
If disable full_page_writes, you're vulnerable to torn pages. If youenable it, you're not. Except if you also turn CRC on. Except if youalso turn "write pages with only hint bit changes" off.
Unfortunatly, there's not a lot of easy solutions here. You could do
two checksums, one with and one without hint bits. The overallchecksumtells you if there's a problem. If it doesn't match the secondchecksumwill tell you if it's the hint bits or not (torn page problem). Ifit's
the hint bits you can reset them all and continue. The checksums need
not be of equal strength.
Hmm, that would work I guess.
The extreme case is an ECC where you explicitly can set it so you can
alter N bits before you need to recalculate the checksum.
Computationally though, that sucks.
Yep. Also, in case of a torn page, you're very likely going to haveseveral hint bits from the old image and several from the new image.An error-correcting code would need to be unfeasibly long to copewith that.
--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block-level CRC checks

Reply via email to