Hi all, After browsing through the debian-kernel mailinglist archive, I found out that there's no one reporting the latest EXT3 problems in the vanilla kernel. The last report of EXT3-problems on the debian-kernel list had to do with JBD, the current problems (as posted on the Linux Kernel mailinglist) are much worse, I think. You might want to check those URLS/subjects of discussion on LKML:
"2.6.18-mm2: ext3 BUG?" http://lkml.org/lkml/2006/10/5/353 Seems unresolved "2.6.19 file content corruption on ext3" http://lkml.org/lkml/2006/12/7/163 Has to do with 2.6.19, but might have it's roots in 2.6.18 "Debugging I/O errors?" http://lkml.org/lkml/2006/10/20/93 Source unknown, but more people seem to have the same problem. These issues got my attention, because I'm having those (or similar) problems myself, on two different machines (clusters, actually) with completely different hardware and disks. I'll explain. I'm maintaining two clusters, with machines running a mix between Debian Stable with Etch-kernels to have AoE (ATA over Ethernet support). Machines in these clusters "export" their harddisks using AoE (check out the "vblade" package), and one machine imports those using the kernel "aoe"-module. On top of those imported devices, multiple RAID5-arrays are created, and LVM is running on top of RAID, ext3 on the LVM LV. After a few days, I get EXT3-errors. like this: > EXT3-fs: mounted filesystem with ordered data mode. > EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared for > block 412186 > Aborting journal on device loop0. > EXT3-fs error (device loop0) in ext3_free_blocks_sb: Journal has aborted > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted > EXT3-fs error (device loop0) in ext3_truncate: Journal has aborted > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted > EXT3-fs error (device loop0) in ext3_orphan_del: Journal has aborted > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted > EXT3-fs error (device loop0) in ext3_delete_inode: Journal has aborted > __journal_remove_journal_head: freeing b_committed_data > __journal_remove_journal_head: freeing b_committed_data (...) > __journal_remove_journal_head: freeing b_committed_data > ext3_abort called. > EXT3-fs error (device loop0): ext3_journal_start_sb: Detected aborted journal > Remounting filesystem read-only > __journal_remove_journal_head: freeing b_committed_data FSCK'ing the filesystem fixes those errors, but after a few days (or weeks, depending on the fs load) the corruptions appear again. I might be worth telling you that there are no other suspicious messages in my logs. This seems to be related to the problem described here: http://myrddin.org/2006/02/14/ext3-nastiness/ and here: http://www.debian-administration.org/users/Utumno/weblog/16 I don't know if I need to file a bug on this, for now I just want to here your thoughts. FYI: Kernel information for cluster 1: > [EMAIL PROTECTED]:~# uname -a > Linux infinity 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 GNU/Linux And cluster 2: > dust:~# uname -a > Linux dust 2.6.18-3-686 #1 SMP Thu Nov 23 20:49:23 UTC 2006 i686 GNU/Linux Thanks for your replies! Best regards, -- Bas van Schaik -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]