This patch set progresses my effort to improve concurrency of
directory operations and specifically to allow concurrent updates
in a given directory.

There are a bunch of VFS patches which introduce some new APIs and
improve existing ones.  Then a bunch of per-filesystem changes which
adjust to meet new needs, often using the new APIs, then a final bunch
of VFS patches which discard some APIs that are no longer wanted, and
one (the second last) which makes the big change.  Some of the fs
patches don't depend on any preceeding patch and if maintainers wanted
to take those early I certainly wouldn't object!  I've put a '*' next
to patches which I think can be taken at any time.

My longer term goal involves pushing the parent-directory locking down
into filesystems (which can then discard it if it isn't needed) and using
exclusive dentry locking in the VFS for all directory operations other
than readdir - which by its nature needs shared locking and will
continue to use the directory lock.

The VFS already has exclusive dentry locking for the limited case of
lookup.  Newly created dentries (when created by d_alloc_parallel()) are
exclusively locked using the DCACHE_PAR_LOOKUP bit.  They remain
exclusive locked until they are hashed as negative or positive dentries,
or they are discarded.

DCACHE_PAR_LOOKUP currently depends on a shared parent lock to exclude
directory modifying operations.  This patch set removes this dependency
so that d_alloc_parallel() can be called without locking and all
directory modifying operations receive either a hashed dentry or an
in-lookup dentry (they currently recieve either a hashed or unhashed,
or sometimes in-lookup (atomic_open only)).

The cases where a filesystem can receive an in-lookup dentry are:
 - lookup. Currently can receive in-lookup or unhashed.  After this patch set
    it always receives in-lookup
 - atomic_open.  Currently can receive in-lookup or hashed-negative.
    This doesn't change with this patchset.
 - rename. currently can receive hashed or unhashed.  After this patchset
    can also receive in-lookup where previously it would receive unhashed.
    This is only for the target of a rename over NFS.
 - link, mknod, mkdir, symlink.  currently received hashed-negative except for
    NFS which notices the implied exclusive create and skips the lookup so
    the filesystem can received unhashed-negative for the operation.

There are two particular needs to be addressed before we can use 
d_alloc_parallel()
outside of the directory lock.

1/ d_alloc_parallel() effects a blocking lock so lock ordering is important.
  If we are to take the directory lock *after* calling d_alloc_parallel() (and 
  still holding an in-lookup dentry, as happens at least when ->atomic_open
  is called) then we must never call d_alloc_parallel() while holding the
  directory lock, even a shared lock.
  This particularly affects readdir as several filesystems prime the dcache
  with readdir results and so use d_alloc_parallel() in the ->iterate_shared
  handler, which will now have deadlock potential.  To address this we
  introduce d_alloc_noblock() which fails rather than blocking.

  A few other cases of potential lock inversion exist.  These are
  addressed by dropping the directory lock when it is safe to do so
  before calling d_alloc_parallel().  This requires the addtion of
  LOOKUP_SHARED so that ->lookup knows how the parent is locked.  This
  is ugly but is gone by the end of the series. After the locking is
  rearranged in the second last patch, ->lookup is only ever called
  with a shared lock.


2/ As d_alloc_parallel() will be able to run without the directory lock,
  holding that lock exclusively is not enough to protect some dcache
  manipulations.  In particular, several filesystems d_drop() a dentry
  and (possibly) re-hash it.  This will no longer be safe as
  d_alloc_parallel() could run while the dentry was dropped, would find
  that name doesn't exist in the dcache, and would create a new dentry
  leading to two uncoordinated dentries with the same name.

  It will still be safe to d_drop() a dentry after the operation has
  completed, whether in success or failure.  But d_drop()ing before that
  is best avoided.  An early d_drop() that isn't followed by a rehash is
  not clearly problematic for a filesystem which still uses parent locking
  (as all do at present) but is good to discourage that pattern now.

  This is addressed, in part, by changing d_splice_alias() to be able to
  instantiate any negative dentry, whether hashed, unhashed, or
  in-lookup.  This removes the need for d_drop() in most cases.

New APIs added are:

 - d_alloc_noblock - see patch 05 for details
 - d_duplicate - patch 06

Removed APIs:

 - d_alloc
 - d_rehash
 - d_add
 - lookup_one
 - lookup_noperm

Changed APIs:

 - d_alloc_paralle - no longer requires a waitqueue_head_t
 - d_splice_alias - now works with in-lookup dentry
 - d_alloc_name - now works with ->d_hash

d_alloc_name() should be used with d_make_persistent().  These don't require
VFS locking as the filesystem doesn't permit create/remove via VFS calls,
and provides its own locking to avoid duplicate names.

d_splice_alias() should *always* be used:
  in ->lookup 
  in ->iterate_shared for cache priming.
  in ->atomic_open, possibly via a call to ->lookup
  in ->mkdir unless d_instantiate_new() can be used.
  in ->link ->symlink ->mknod if ->lookup skips LOOKUP_CREATE|LOOKUP_EXCL

Thanks for reading this far!  I've been testing NFS but haven't tried
anything else yet.  As well as the normal review of details I'd love to
know if I've missed any important conseqeunces of the locking change.
It is a big conceptual change and there could easily be surprising
implications.

Thanks,
NeilBrown


 [PATCH 01/53] VFS: fix various typos in documentation for
 [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup
 [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash
 [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel()
 [PATCH 05/53] VFS: introduce d_alloc_noblock()
 [PATCH 06/53] VFS: add d_duplicate()
 [PATCH 07/53] VFS: Add LOOKUP_SHARED flag.
 [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in
*[PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from
 [PATCH 10/53] nfs: use d_splice_alias() in nfs_link()
 [PATCH 11/53] nfs: don't d_drop() before d_splice_alias()
 [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in
 [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
 [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename
 [PATCH 15/53] nfs: use d_duplicate()
*[PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
*[PATCH 17/53] coda: don't d_drop() early.
 [PATCH 18/53] shmem: use d_duplicate()
*[PATCH 19/53] afs: use d_time instead of d_fsdata
*[PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename
 [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode()
 [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename()
 [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock.
 [PATCH 24/53] afs: use d_duplicate()
*[PATCH 25/53] smb/client: use d_time to store a timestamp in dentry,
*[PATCH 26/53] smb/client: don't unhashed and rehash to prevent new
*[PATCH 27/53] smb/client: use d_splice_alias() in atomic_open
 [PATCH 28/53] smb/client: Use d_alloc_noblock() in
*[PATCH 29/53] exfat: simplify exfat_lookup()
*[PATCH 30/53] configfs: remove d_add() calls before
 [PATCH 31/53] configfs: stop using d_add().
*[PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
*[PATCH 33/53] ext4: use on-stack dentries in
 [PATCH 34/53] tracefs: stop using d_add().
 [PATCH 35/53] cephfs: stop using d_add().
*[PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME
 [PATCH 37/53] cephfs: Use d_alloc_noblock() in
 [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
 [PATCH 39/53] ecryptfs: stop using d_add().
 [PATCH 40/53] gfs2: stop using d_add().
 [PATCH 41/53] libfs: stop using d_add().
 [PATCH 42/53] fuse: don't d_drop() before d_splice_alias()
 [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link()
 [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in
 [PATCH 45/53] efivarfs: use d_alloc_name()
 [PATCH 46/53] Remove references to d_add() in documentation and
 [PATCH 47/53] VFS: make d_alloc() local to VFS.
 [PATCH 48/53] VFS: remove d_add()
 [PATCH 49/53] VFS: remove d_rehash()
 [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm()
 [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl().
 [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock
 [PATCH 53/53] VFS: remove LOOKUP_SHARED

Reply via email to