> On 07 Nov 2016, at 00:11, Jaromír Doleček <jaromir.dole...@gmail.com> wrote:
> 
>> On Fri, Nov 04, 2016 at 04:44:10PM +0100, J. Hannken-Illjes wrote:
>>> - This change results in "panic: ffs_blkfree_common: freeing free block"
>>>  if I put a file system under stress (*1).
>>> 
>>> - I suppose not zeroing the blocks to be freed before freeing them
>>>  makes the life of fsck harder.
>>> 
>>> - Running "brelse(bp, BC_INVAL)" doesn't look OK.
> 
> The brelse(bp, BC_INVAL) call was there before as well, but the
> condition changed and is wrong.
> 
> I can repeat the problem with your script and the packaged fsx (thanks
> Thomas). Whipped up a patch to what looked wrong there, and it no
> longer panics for me. Patch is attached. I'll test further and commit
> it tomorrow.

Index: ffs_inode.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_inode.c,v
retrieving revision 1.118
diff -u -p -r1.118 ffs_inode.c
--- ffs_inode.c 28 Oct 2016 20:38:12 -0000      1.118
+++ ffs_inode.c 6 Nov 2016 23:09:15 -0000
@@ -675,18 +675,18 @@ ffs_indirtrunc(struct inode *ip, daddr_t
         * Recursively free blocks on the now last partial indirect block.
         */
        if (level > SINGLE && lastbn >= 0) {
-               nb = RBAP(ip, last);
+               last = lastbn % factor;
+               nb = RBAP(ip, i);
                if (nb != 0) {
                        error = ffs_indirtrunc(ip, nlbn, FFS_FSBTODB(fs, nb),
-                                              lastbn % factor, level - 1,
-                                              countp);
+                                              last, level - 1, countp);
                        if (error)
                                goto out;
                }
        }
 
 out:
-       if (RBAP(ip, 0) == 0) {
+       if (lastbn < 0 && error == 0) {
                /* all freed, release without writing back */
                brelse(bp, BC_INVAL);
        } else {

The first part should not be necessary.  After the loop we should have
"i == last" -- from a quick look "i < last" is impossible.

The second part is right and responsible for most of the panics.

Your patch still doesn't address my second observation, if the
machine crashs inside ffs_truncate we leave the file system in a 
state fsck can't handle.  The blocks we freed get attached to other
nodes before they get cleared from our on-disk copy leading to
duplicate blocks.

The attached diff:

- Brings back the non-wapbl behaviour to first clear the allocations
  in the on-disk node and then deallocating them.

- Fixes the condition to brelse() on exit like your diff does.

- Makes sure a block that is completely deallocated and NOT written
  to disk gets deallocated even if this deallocation would fail.

Please review.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)

Attachment: ffs_inode.c.diff
Description: Binary data

Reply via email to