On a nightly basis, we shut the database down and do a file system backup. A short chronology of our database problem:
8/21 - count(*) of user tables succeeded (done once a week to get statistics) 8/23 - A specific search on a specific value (one out of over 2 million) caused postmaster to SEGV. I dropped the index in question and rebuilt it. All appeared ok. 8/28 - count(*) failed - postmaster aborted on a SEGV. I attempted a pg_dumpall. Postmaster aborted with SEGV about 240 MB into the dump (the database is about 50 GB). I reloaded 8/25 database and attempted a pg_dumpall. Same failure. I reloaded from the 8/21 file system dump. Pg_dumpall worked on this one. I spent the next almost 48 hours dumping and reloading 240 million rows and reprocessing the several million additions since 8/21. I didn't dare use the 8/21 database without reloading because I didn't know if it was good or not. Question: How can we tell that a database is intact? In the above example, pg_dumpall worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23, or was it already corrupt and got worse? Pg_dumpall tells you nothing about the condition of indexes. Could a corrupt index corrupt data blocks? I'm looking at doing a pg_dumpall on a weekly basis so that we have a point in time where we know we have a recoverable database. When the database reaches several hundred GB and over over a billion rows, this isn't a great solution, and doesn't address the overall database integrity. Back to the original question... How can I verify the complete integrity of a database - especially a very large one where a reload or full index rebuild could take on the order of days? Wes ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match