On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian <br...@momjian.us> wrote: > On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: >> > Yeah. Believe me -- I know the drill. Most or all the damage seemed >> > to be to the system catalogs with at least two critical tables dropped >> > or inaccessible in some fashion. A lot of the OIDs seemed to be >> > pointing at the wrong thing. Couple more datapoints here. >> > >> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) >> > *) Another database on the same cluster was not impacted. However >> > it's more olap style and may not have been written to during the >> > outage >> > >> > Now, this infrastructure running this system is running maybe 100ish >> > postgres clusters and maybe 1000ish sql server instances with >> > approximately zero unexplained data corruption issues in the 5 years >> > I've been here. Having said that, this definitely smells and feels >> > like something on the infrastructure side. I'll follow up if I have >> > any useful info. >> >> After a thorough investigation I now have credible evidence the source >> of the damage did not originate from the database itself. >> Specifically, this database is mounted on the same volume as the >> operating system (I know, I know) and something non database driven >> sucked up disk space very rapidly and exhausted the volume -- fast >> enough that sar didn't pick it up. Oh well :-) -- thanks for the help > > However, disk space exhaustion should not lead to corruption unless the > underlying layers lied in some way.
I agree -- however I'm sufficiently separated from the things doing the things that I can't verify that in any real way. In the meantime I'm going to take standard precautions (enable checksums/dedicated volume/replication). Low disk space also does not explain the bizarre outage I had last friday. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers