the attached patch fixes the ext2-fs filesystem corruption that initially showed up during stress-tests running on Red Hat's internal testsystems, and which bug kept us busy for a long time. We suspect that a number of filesystem corruption bugs reported to l-k were caused by this bug too. after lots of bug-hunting and bug-finding, the main bug turned out to be this one: a block that might still (or already) be allocated by ext2fs was bforgot()ten by ext2_get_block(). The (incorrect) assumption of the code was that this rare case can only happen in case of a parallel truncate() to this inode. In reality there might be enough delay to this particular process in which window this same block might have gotten reallocated and dirtied again. Thus the bforget() discards possibly valid metadata, most commonly indirect blocks, causing filesystem corruption. via customized Cerberus testruns the bug used to trigger within 20 minutes on SMP systems, and within a couple of hours on UP systems. (initially it was very hard to trigger the bug, it tooks days of nonstop high-load tests for the corruption to trigger.) Alex Romosan's report to l-k looks suspiciously similar to the failures we've been getting. The corruption has been introduced in 2.4.0-test6. With this patch applied we've been unable to reproduce any kind of filesystem corruption under the most extreme 'enterprise' loads. The attached patch applies to 2.4.4-pre1 cleanly and compiles, boots just fine. There is a sister-bug in minix-fs as well, the patch fixes that bug too. The patch also includes two inode-dirtying fixes, which bugs never caused any filesystem corruptions but were incorrect nevertheless. Ingo
--- linux/fs/ext2/inode.c.orig Tue Apr 10 19:00:35 2001 +++ linux/fs/ext2/inode.c Tue Apr 10 19:06:25 2001 @@ -568,7 +568,7 @@ changed: while (partial > chain) { - bforget(partial->bh); + brelse(partial->bh); partial--; } goto reread; @@ -799,8 +799,8 @@ /* Writer: ->i_blocks */ inode->i_blocks -= blocks * count; /* Writer: end */ - ext2_free_blocks (inode, block_to_free, count); mark_inode_dirty(inode); + ext2_free_blocks (inode, block_to_free, count); free_this: block_to_free = nr; count = 1; @@ -811,8 +811,8 @@ /* Writer: ->i_blocks */ inode->i_blocks -= blocks * count; /* Writer: end */ - ext2_free_blocks (inode, block_to_free, count); mark_inode_dirty(inode); + ext2_free_blocks (inode, block_to_free, count); } }