On a nightly basis, we shut the database down and do a file system backup.

A short chronology of our database problem:

8/21 - count(*) of user tables succeeded (done once a week to get
statistics)

8/23 - A specific search on a specific value (one out of over 2 million)
caused postmaster to SEGV.  I dropped the index in question and rebuilt it.
All appeared ok.

8/28 - count(*) failed - postmaster aborted on a SEGV.  I attempted a
pg_dumpall.  Postmaster aborted with SEGV about 240 MB into the dump (the
database is about 50 GB).

I reloaded 8/25 database and attempted a pg_dumpall.  Same failure.

I reloaded from the 8/21 file system dump.  Pg_dumpall worked on this one.
I spent the next almost 48 hours dumping and reloading 240 million rows and
reprocessing the several million additions since 8/21.  I didn't dare use
the 8/21 database without reloading because I didn't know if it was good or
not.


Question:

How can we tell that a database is intact?  In the above example, pg_dumpall
worked on the 8/21 database.  Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse?  Pg_dumpall tells you nothing about
the condition of indexes.  Could a corrupt index corrupt data blocks?

I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database.  When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.

Back to the original question...  How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?

Wes 


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to