On Thu, 2023-04-06 at 16:41 +0000, Evgeny Morozov wrote: > Our PostgreSQL 15.2 instance running on Ubuntu 18.04 has crashed with this > error: > > 2023-04-05 09:24:03.448 UTC [15227] ERROR: index "pg_class_oid_index" > contains unexpected zero page at block 0 > [...] > > We had the same thing happened about a month ago on a different database on > the same cluster. > For a while PG actually ran OK as long as you didn't access that specific DB, > but when trying > to back up that DB with pg_dump it would crash every time. At that time one > of the disks > hosting the ZFS dataset with the PG data directory on it was reporting > errors, so we thought > it was likely due to that. > > Unfortunately, before we could replace the disks, PG crashed completely and > would not start > again at all, so I had to rebuild the cluster from scratch and restore from > pg_dump backups > (still onto the old, bad disks). Once the disks were replaced (all of them) I > just copied > the data to them using zfs send | zfs receive and didn't bother restoring > pg_dump backups > again - which was perhaps foolish in hindsight. > > Well, yesterday it happened again. The server still restarted OK, so I took > fresh pg_dump > backups of the databases we care about (which ran fine), rebuilt the cluster > and restored > the pg_dump backups again - now onto the new disks, which are not reporting > any problems. > > So while everything is up and running now this error has me rather concerned. > Could the > error we're seeing now have been caused by some corruption in the PG data > that's been there > for a month (so it could still be attributed to the bad disk), which should > now be fixed by > having restored from backups onto good disks?
Yes, that is entirely possible. > Could this be a PG bug? It could be, but data corruption caused by bad hardware is much more likely. > What can I do to figure out why this is happening and prevent it from > happening again? No idea about the former, but bad hardware is a good enough explanation. As to keeping it from happening: use good hardware. Yours, Laurenz Albe