On 3/8/13 4:40 PM, Greg Stark wrote:
On Fri, Mar 8, 2013 at 5:46 PM, Josh Berkus <j...@agliodbs.com> wrote:
After some examination of the systems involved, we conculded that the
issue was the FreeBSD drivers for the new storage, which were unstable
and had custom source patches.  However, without PostgreSQL checksums,
we couldn't *prove* it wasn't PostgreSQL at fault.  It ended up taking
weeks of testing, most of which was useless, to prove to them they had a
driver problem so it could be fixed.  If Postgres had had checksums, we
could have avoided wasting a couple weeks looking for non-existant
PostgreSQL bugs.

How would Postgres checksums have proven that?

It's hard to prove this sort of thing definitively. I see this more as a source of evidence that can increase confidence that the database is doing the right thing, most usefully in a replication environment. Systems that care about data integrity nowadays are running with a WAL shipping replica of some sort. Right now there's no way to grade the master vs. standby copies of data, to figure out which is likely to be the better copy. In a checksum environment, here's a new troubleshooting workflow that becomes possible:

1) Checksum error happens on the master.
2) The same block is checked on the standby. It has the same 16 bit checksum, but different data, and its checksum matches its data. 3) The copy of that block on the standby, which was shipped over the network instead of being stored locally, is probably good. 4) The database must have been consistent when the data was in RAM on the master. 5) Conclusion: there's probably something wrong at a storage layer below the database on the master.

Now, of course this doesn't automatically point the finger correctly with every possible corruption possibility. But this example is a situation I've seen in the real world when a bad driver flips a random bit in a block. If Josh had been able to show his client the standby server built from streaming replication was just fine, and corruption was limited to the master, that doesn't *prove* the database isn't the problem. But it does usefully adjust the perception of what faults are likely and unlikely away from it. Right now when I see master/standby differences in data blocks, it's the old problem of telling the true time when you have two clocks. Having a checksum helps pick the right copy when there is more than one, and one has been corrupted by storage layer issues.

If i understand the performance issues right the main problem is the
extra round trip to the wal log which can require a sync. Is that
right?

I don't think this changes things such that there is a second fsync per transaction. That is a worthwhile test workload to add though. Right now the tests Jeff and I have ran have specifically avoided systems with slow fsync, because you can't really test the CPU/memory overhead very well if you're hitting the rotational latency bottleneck.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to