Problem Summary: --------------- If an xattr directory inode and its xattr child inode are on the _same_ disposal list, and the xattr directory inode is _before_ its xattr child inode in this disposal list...
Then zfs_purgedir() of the xattr directory calls zfs_zget() for the xattr child inode and it loops forever -- it can only stop if the xattr child inode is disposed/evicted, but it could only occur _after_ in the disposal list and current list node is looping... Because zfs_zget() gets non-NULL from dmu_buf_get_user() (which could go NULL only in the ZFS evict path later in disposal list) so it goes to igrab() but that returns NULL (because the inode.i_state got I_FREEING), then 'goto again:', which repeats that over. Function path: shrink_slab - do_shrink_slab - shrinker->scan_objects == super_cache_scan - prune_icache_sb - list_sru_shrink_walk (creates disposal list with xattr dir&child inodes) - inode_lru_isolate(inode) - inode->i_state |= I_FREEING (problem for igrab of xattr child inode, below) - dispose_list - evict(xattr dir inode) - op->evict_inode == zpl_evict_inode - zfs_inactive - zfs_zinactive - zfs_rmnode - zfs_purgedir - zfs_zget (xattr child nodes) - dmu_buf_get_user (non-NULL) - igrab (NULL) - goto again; ... thus never reaching ... - evict(xattr child inode) - op->evict_inode == zpl_evict_inode - zfs_inactive - zfs_zinactive - zfs_znode_dmu_fini - sa_handle_destroy - dmu_buf_remove_user (not calling this yet is a problem for dmu_buf_get_user, above) (this would make it return NULL and not go into the igrab call) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1839521 Title: Xenial: ZFS deadlock in shrinker path with xattrs Status in zfs-linux package in Ubuntu: Invalid Status in zfs-linux source package in Xenial: In Progress Status in zfs-linux source package in Bionic: Invalid Status in zfs-linux source package in Disco: Invalid Status in zfs-linux source package in Eoan: Invalid Bug description: [Impact] * Xenial's ZFS can deadlock in the memory shrinker path after removing files with extended attributes (xattr). * Extended attributes are enabled by default, but are _not_ used by default, which reduces the likelyhood. * It's very difficult/rare to reproduce this problem, due to file/xattr/remove/shrinker/lru order/timing circumstances required. (weeks for a reporter user) but a synthetic test-case has been found for tests. [Test Case] * A synthetic reproducer is available for this LP, with a few steps to touch/setfattr/rm/drop_caches plus a kernel module to massage the disposal list. * In the original ZFS module: the xattr dir inode is not purged immediately on file removal, but possibly purged _two_ shrinker invocations later. This allows for other thread started before file remove to call zfs_zget() on the xattr child inode and iput() it, so it makes to the same disposal list as the xattr dir inode. * In the modified ZFS module: the xattr dir inode is purged immediately on file removal not possibly later on shrinker invocation, so the problem window above doesn't exist anymore. [Regression Potential] * Low. The patches are confined to extended attributes in ZFS, specifically node removal/purge, and another change how an xattr child inode tracks its xattr dir (parent) inode, so that it can be purged immediately on removal. * The ZFS test-suite has been run on original/modified zfs-dkms package/kernel modules, with no regressions. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1839521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp