On Mon, 2007-12-31 at 18:35 -0500, Tom Lane wrote: > "Mason Hale" <[EMAIL PROTECTED]> writes: > >> This could be the kernel's fault, but I'm wondering whether the > >> RAID controller is going south. > > > To clarify a bit further -- on the production server, the data is written to > > a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a separate, > > dedicated SATA II disk. > > > There is a similar setup on the standby server, except that in addition to > > the RAID for the data, and a separate SATA II disk for the pg_xlog, there is > > another disk (also SATA II) dedicated for the archive of wal files copied > > over from the production server. > > Oh. Maybe it's one of those disks' fault then. Although WAL corruption > would not lead to corruption of the primary DB as long as there were no > crash/replay events. Maybe there is more than one issue here, or maybe > it's the kernel's fault after all.
The standby replays from the archive drive, whereas the primary does crash recovery from the pg_xlog. We know that the primary is corrupted in some way, and so is the standby, plus we know the standby corruption occurred after it was copied to the archive and restored/recovered. So we must have problems on at least two drives. If we have had at least one recent primary server database crash recovery then we might explain all the corruptions by a common issue related to the SATA II drives. That might be the device driver but maybe other things as well. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster