I encountered PANICs on CentOS 5.0 when I ran write-mostly workload. It occurs only if wal_sync_method is set to open_sync; there were no problem in fdatasync. It occurred on both Postgres 8.2.5 and 8.3dev.
PANIC: could not write to log file 0, segment 212 at offset 3399680, length 737280: Input/output error STATEMENT: COMMIT; My nearby Linux guy says mixed usage of bufferd I/O and direct I/O could cause errors (EIO) on many version of Linux kernels. If we use buffered I/O before direct I/O, Linux could fail to discard kernel buffer cache of the region and report EIO -- yes, it's a bug in Linux. We use bufferd I/O on WAL segements even if wal_sync_method is open_sync. We initialized segements with zero using buffered I/O, and after that, we re-open them with specified sync options. The behaviors in the bug are different on RHEL 4 and 5. RHEL 4 -> No error reports even though the kernel cache is incosistenet. RHEL 5 -> write() failes with EIO (Input/output error) PANIC occurs only on RHEL 5, but RHEL 4 also has a problem. If a wal archiver reads the inconsistent cache of wal segments, it could archive wrong contents and PITR might fail at the corrupted archived file. I'll recommend not to use open_sync for users on Linux until the bug is fiexed. However, are there any idea to avoid the bug and to use direct i/o? Mixed usage of bufferd and direct i/o is legal, but enforces complexity to kernels. If we simplify it, things would be more relaxed. For example, dropping zero-filling and only use direct i/o. Is it possible? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq