FYI, I have seen the SW linux raid not detect failed drives and cause
filesystem corruption on many occasions. I would reccomend staying
away from it. Maybe what you describe is a problem with PG but, i
doubt it.
On Jul 12, 2004, at 12:31 PM, Florian G. Pflug wrote:
Hi
We have again experienced data-corruption using 7.4.2 on an XFS
Filesystem
on top of a software-raid (md) raid-1.
After a server crash last night (It was a rather strange crash - The
machine
was still pingable, but no login was possible, and postgres and apache
didn't respond to requests any more) we hard-reset the machine. It
came up
again nicely, but a few hours later the following errors occured when
trying
to access certain tabled. (Those tables are updated heavily - each day
about
2 million tuples are inserted, and the old versions of those tuples
deleted).
ERROR: could not access status of transaction 34048
DETAIL: could not open file "/var/lib/postgres/data/pg_clog/0000": No
such
file or directory
While reading linux-kernel today, I stumbled upon a description of a
rather
strange XFS behaviour. It seems to zero a block if the block was
updated,
and the corresponding metadata-update was flushed to disk, but not the
data
itself.
It does not happen if the file is fsynced() after the update - but I
was
wondering what would happen if the machine crashed between the write()
and
the fsync().
The lkml thread about this can be found here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/0359.html
Could this XFS behaviour cause the postgres problems we are seeing?
greetings, Florian Pflug
---------------------------(end of
broadcast)---------------------------
TIP 8: explain analyze is your friend
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match