* Greg Stark (gsst...@mit.edu) wrote: > I assume you mean back out the changes incrementally until you find a > full_page_write and see if it matches?
To be honest, you're already assuming I know more about how this all works than I do. :) The gist of my thought was simply- we write out block changes to the WAL, including data in many cases. If we were to look at the very end of the WAL, at the last piece of data written there, and the data files have supposedly been flushed, then what's in the WAL at that point should match what's in the data files, right? If it doesn't, that'd be bad. > And continue comparing with > full_page_writes once per checkpoint? If we could only do it when there's a full page write, then perhaps that would work as well, but I thought we tracked them at a lower level. In any case, the idea is the same- compare what's in WAL to what's supposed to be on disk, and alarm whenever there's a clear error. > I don't think the WAL has enough > information to replay backwards though. For example vacuum cleanup > records just list the tids to remove. They don't have the contents to > replace there. Right, you couldn't actually move the database backwards in time using this tool (because we only write out new data, we don't write out what was in that block/page before the write)- that isn't the idea or intent. It would just be a tool that someone could run against a database where they've detected corruption (or, I dunno, more frequently, to perhaps catch corruption faster?), to see if the problem is a PG bug or a hardware/kernel/etc issue. In fact, if you could somehow do this against a backup that's been taken using pg_start_backup/pg_stop_backup, that would be pretty awesome. I know that if such a tool existed, I'd be happy to run it as part of my regular backup routines- I *always* have all the WALs from my last backup to my next backup (and typically farther back than that, eg: if I run full backups weekly, I'll have 4 full backups + all 4 weeks of WALs, to be able to replay back to any point in the month..). The big question that I have is- would this actually be productive? Would it actually be able to catch hardware corruption or help at all with PG bugs? Those are the things I'm not really sure about. Thanks, Stephen
signature.asc
Description: Digital signature