On Mon, 2007-12-31 at 18:35 -0500, Tom Lane wrote:
> "Mason Hale" <[EMAIL PROTECTED]> writes:
> >> This could be the kernel's fault, but I'm wondering whether the
> >> RAID controller is going south.
> 
> > To clarify a bit further -- on the production server, the data is written to
> > a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a separate,
> > dedicated SATA II disk.
> 
> > There is a similar setup on the standby server, except that in addition to
> > the RAID for the data, and a separate SATA II disk for the pg_xlog, there is
> > another disk (also SATA II) dedicated for the archive of wal files copied
> > over from the production server.
> 
> Oh.  Maybe it's one of those disks' fault then.  Although WAL corruption
> would not lead to corruption of the primary DB as long as there were no
> crash/replay events.  Maybe there is more than one issue here, or maybe
> it's the kernel's fault after all.

The standby replays from the archive drive, whereas the primary does
crash recovery from the pg_xlog. We know that the primary is corrupted
in some way, and so is the standby, plus we know the standby corruption
occurred after it was copied to the archive and restored/recovered. So
we must have problems on at least two drives.

If we have had at least one recent primary server database crash
recovery then we might explain all the corruptions by a common issue
related to the SATA II drives. That might be the device driver but maybe
other things as well.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to