On Mon, Sep 25, 2017 at 05:14:04PM -0600, Ross Zwisler wrote:
> Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when
> any mappings are present.
> 
> Signed-off-by: Ross Zwisler <ross.zwis...@linux.intel.com>
> ---
>  fs/xfs/xfs_ioctl.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 386b437..7a24dd5 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1012,12 +1012,10 @@ xfs_diflags_to_linux(
>               inode->i_flags |= S_NOATIME;
>       else
>               inode->i_flags &= ~S_NOATIME;
> -#if 0        /* disabled until the flag switching races are sorted out */
>       if ((xflags & FS_XFLAG_DAX) || (ip->i_mount->m_flags & XFS_MOUNT_DAX))
>               inode->i_flags |= S_DAX;
>       else
>               inode->i_flags &= ~S_DAX;
> -#endif
>  }
>  
>  static bool
> @@ -1049,6 +1047,8 @@ xfs_ioctl_setattr_xflags(
>  {
>       struct xfs_mount        *mp = ip->i_mount;
>       uint64_t                di_flags2;
> +     struct address_space    *mapping = VFS_I(ip)->i_mapping;
> +     bool                    dax_changing;
>  
>       /* Can't change realtime flag if any extents are allocated. */
>       if ((ip->i_d.di_nextents || ip->i_delayed_blks) &&
> @@ -1084,10 +1084,23 @@ xfs_ioctl_setattr_xflags(
>       if (di_flags2 && ip->i_d.di_version < 3)
>               return -EINVAL;
>  
> +     dax_changing = xfs_is_dax_state_changing(fa->fsx_xflags, ip);
> +     if (dax_changing) {
> +             i_mmap_lock_read(mapping);
> +             if (mapping_mapped(mapping)) {
> +                     i_mmap_unlock_read(mapping);
> +                     return -EBUSY;
> +             }
> +     }
> +
>       ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
>       ip->i_d.di_flags2 = di_flags2;
>  
>       xfs_diflags_to_linux(ip);
> +
> +     if (dax_changing)
> +             i_mmap_unlock_read(mapping);

Is this safe to be taking here under the ILOCK_EXCL? i.e. this is
the lock order here:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> ILOCK_EXCL -> i_mmap_rwsem

The truncate path must run outside the ILOCK
context, and it does this order via unmap_mapping_range:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> i_mmap_rwsem

On a page fault, we do:

        mmap_sem -> MMAPLOCK_EXCL -> page lock -> ILOCK_EXCL

Which gives the order

IOLOCK_EXCL
  -> mmap_sem
     -> MMAPLOCK_EXCL
        -> page lock
           -> ILOCK_EXCL
              -> i_mmap_rwsem

What I'm not clear on is what the orders between page locks and
pte locks and i_mapping_tree_lock and i_mmap_rwsem. If there's any
locks that the filesystem can take above the ILOCK that are also
taken under the i_mmap_rwsem, then we have a deadlock vector.

Historically we've avoided any mm/ level interactions under the
ILOCK_EXCL because of it's location in the page fault path locking
order (e.g. lockdep will go nuts if we take a page fault with the
ILOCK held). Hence I'm extremely wary of putting any other mm/ level
locks under the ILOCK like this without a clear explanation of the
locking orders and why it won't deadlock....

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com

Reply via email to