OGAWA Hirofumi <[EMAIL PROTECTED]> wrote:
>
> ...
> I got the above Oops while doing fs stress test.
> 
> The cause of this was race condition between __sync_single_inode() and
> iput()/bdev_clear_inode().
> 
> This race seems following condition.
> 
>           cpu0 (fs's inode)                 cpu1 (bdev's inode)
> ------------------------------------------------------------------------
>                                                close("/dev/hda2")
>                                        [...]
> __sync_single_inode()
>    /* copy the bdev's ->i_mapping */
>    mapping = inode->i_mapping;
> 
>                                        generic_forget_inode()
>                                           bdev_clear_inode()
>                                            /* restre the fs's ->i_mapping */
>                                            inode->i_mapping = &inode->i_data;
>                                         /* bdev's inode was freed */
>                                           destroy_inode(inode);
> 
>    if (wait) {
>       /* dereference a freed bdev's mapping->host */
>       filemap_fdatawait(mapping);  /* Oops */
> 

The __sync_single_inode() caller takes a ref on the inode to prevent things
like this from happening.

What was the call path on the other process?  The one running
destroy_inode()?  unmount?

> I wrote the attached patch for making sure fs's inode is not in
> __sync_single_inode().

It would be nicer to honour the extra inode ref in the unmount path, if
poss.  Only swizzle i_mapping when the inode is really not in use.

> +/* Called under inode_lock. */
> +void wait_inode_ilock(struct inode *inode)
> +{
> +     wait_queue_head_t *wqh;
> +     DEFINE_WAIT_BIT(wq, &inode->i_state, __I_LOCK);
> +
> +     if (!(inode->i_state & I_LOCK))
> +             return;
> +
> +     wqh = bit_waitqueue(&inode->i_state, __I_LOCK);
> +     do {
> +             __iget(inode);
> +             spin_unlock(&inode_lock);
> +             __wait_on_bit(wqh, &wq, inode_wait, TASK_UNINTERRUPTIBLE);
> +             iput(inode);
> +             spin_lock(&inode_lock);
> +     } while (inode->i_state & I_LOCK);
> +}

Does this differ from wait_on_inode()?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to