On Fri, 2006-07-28 at 22:30, Merlin Moncure wrote: > On 7/28/06, Arnaud Lesauvage <[EMAIL PROTECTED]> wrote: > > Csaba Nagy wrote: > > > I found that PITR using WAL shipping is not protecting against all > > > failure scenarios... it sure will help if the primary machine's hardware > > > fails, but in one case it was useless for us: the primary had a linux > > > kernel with buggy XFS code (that's what I think it was, cause we never > > > found out for sure) and we did use XFS for the data partition, and at > > > one point it started to get corruptions at the data page level. The > > > corruption was promptly transferred to the standby, and therefore it was > > > also unusable... we had to recover from a backup, with the related > > > downtime. Not good for business... > > > > > OK, but corruption at the data page level is a very unlikely > > event, isn't it ?
It's not... it just happened to me again, strangely this time on a Slony replica. It might be that the hardware/OS/FS combination we use is the problem, might be that postgres has some problem with those (I would exclude slony being able to produce such things). But it did happened, and I can't exclude it will happen again. This time I'll be able to investigate closer I hope. > yes, and that is not a pitr problem, that is a data corruption > problem. i am very suspicious that slony style replication would > provide any sort of defense against replicating from a machine which > is changing bytes from a to b, etc. i think the best defense against > *that* sort of problem would be synchronous replication via pgpool. When it happened for us, it was a few blocks in some tables, and I suspect it was a OS/FS bug. In that case slony would not propagate the error, it might propagate bad data, but not the page error itself. So it might not protect against bad data, but I will be able to switch over and have a working system immediately compared to recover from a backup from yesterday after a downtime of 8 hours. So instead of loosing data worth of 1 day and have a downtime of 8 hours I'll have a downtime of 1 minute and have a few bad entries in the DB... for the kind of application we have here it is definitely a better scenario. Cheers, Csaba. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings