Problem Summary:
---------------
If an xattr directory inode and its xattr child inode are on the _same_
disposal list,
and the xattr directory inode is _before_ its xattr child inode in this
disposal list...
Then zfs_purgedir() of the xattr directory calls zfs_zget() for the xattr child
inode
and it loops forever -- it can only stop if the xattr child inode is
disposed/evicted,
but it could only occur _after_ in the disposal list and current list node is
looping...
Because zfs_zget() gets non-NULL from dmu_buf_get_user() (which could go NULL
only in
the ZFS evict path later in disposal list) so it goes to igrab() but that
returns NULL
(because the inode.i_state got I_FREEING), then 'goto again:', which repeats
that over.
Function path:
shrink_slab
- do_shrink_slab
- shrinker->scan_objects == super_cache_scan
- prune_icache_sb
- list_sru_shrink_walk
(creates disposal list with xattr dir&child inodes)
- inode_lru_isolate(inode)
- inode->i_state |= I_FREEING
(problem for igrab of xattr child inode, below)
- dispose_list
- evict(xattr dir inode)
- op->evict_inode == zpl_evict_inode
- zfs_inactive
- zfs_zinactive
- zfs_rmnode
- zfs_purgedir
- zfs_zget (xattr child nodes)
- dmu_buf_get_user (non-NULL)
- igrab (NULL)
- goto again;
... thus never reaching ...
- evict(xattr child inode)
- op->evict_inode == zpl_evict_inode
- zfs_inactive
- zfs_zinactive
- zfs_znode_dmu_fini
- sa_handle_destroy
- dmu_buf_remove_user
(not calling this yet is a problem for dmu_buf_get_user,
above)
(this would make it return NULL and not go into the igrab
call)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1839521
Title:
Xenial: ZFS deadlock in shrinker path with xattrs
Status in zfs-linux package in Ubuntu:
Invalid
Status in zfs-linux source package in Xenial:
In Progress
Status in zfs-linux source package in Bionic:
Invalid
Status in zfs-linux source package in Disco:
Invalid
Status in zfs-linux source package in Eoan:
Invalid
Bug description:
[Impact]
* Xenial's ZFS can deadlock in the memory shrinker path
after removing files with extended attributes (xattr).
* Extended attributes are enabled by default, but are
_not_ used by default, which reduces the likelyhood.
* It's very difficult/rare to reproduce this problem,
due to file/xattr/remove/shrinker/lru order/timing
circumstances required. (weeks for a reporter user)
but a synthetic test-case has been found for tests.
[Test Case]
* A synthetic reproducer is available for this LP,
with a few steps to touch/setfattr/rm/drop_caches
plus a kernel module to massage the disposal list.
* In the original ZFS module:
the xattr dir inode is not purged immediately on
file removal, but possibly purged _two_ shrinker
invocations later. This allows for other thread
started before file remove to call zfs_zget() on
the xattr child inode and iput() it, so it makes
to the same disposal list as the xattr dir inode.
* In the modified ZFS module:
the xattr dir inode is purged immediately on file
removal not possibly later on shrinker invocation,
so the problem window above doesn't exist anymore.
[Regression Potential]
* Low. The patches are confined to extended attributes
in ZFS, specifically node removal/purge, and another
change how an xattr child inode tracks its xattr dir
(parent) inode, so that it can be purged immediately
on removal.
* The ZFS test-suite has been run on original/modified
zfs-dkms package/kernel modules, with no regressions.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1839521/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp