Problem Details:
---------------

Helper scripts 
(create /zfs.img and mount it on /zfs; and setup kprobe events for debug)

$ sudo ./zfs-mount.sh
$ sudo ./zfs-kprobes.sh

Print kprobe events to screen as we go:

$ sudo cat /sys/kernel/debug/tracing/trace_pipe &

Create file:
- allocates normal/file znode (flag=0x0)
- - its object number is obj=0x7
- - its znode pointer is zpp=0xffff8800a65f8000

$ touch /zfs/file
           <...>-20059 [000] d...  6718.949684: p_zfs_mknode_0: 
(zfs_mknode+0x0/0xe10 [zfs]) flag=0x0 dzp=0xffff8802115b0000
           touch-20059 [000] d...  6718.949791: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x7
           touch-20059 [000] d...  6718.949806: r_zfs_znode_alloc_0: 
(zfs_mknode+0x8ae/0xe10 [zfs] <- zfs_znode_alloc) zpp=0xffff8800a65f8000

Set extended attribute on the file:
- allocates xattr directory znode (flag=0x2)
- - its parent znode is file znode (dzp=0xffff8800a65f8000)
- - its object number is obj=0x8
- - its znode pointer is zpp=0xffff8802111a8000

- allocates xattr znode (flag=0x0, inherits xattr bit from parent node)
- - its parent znode is xattr dir znode (dzp=0xffff8802111a8000)
- - its object number is obj=0x9
- - its znode pointer is zpp=0xffff8802111a8448

$ setfattr -n user.debug -v 1 /zfs/file
           <...>-31701 [004] d...  6770.933127: p_zfs_mknode_0: 
(zfs_mknode+0x0/0xe10 [zfs]) flag=0x2 dzp=0xffff8800a65f8000
           <...>-31701 [004] d...  6770.933287: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x8
           <...>-31701 [004] d...  6770.933312: r_zfs_znode_alloc_0: 
(zfs_mknode+0x8ae/0xe10 [zfs] <- zfs_znode_alloc) zpp=0xffff8802111a8000

           <...>-31701 [004] d...  6770.933414: p_zfs_mknode_0: 
(zfs_mknode+0x0/0xe10 [zfs]) flag=0x0 dzp=0xffff8802111a8000
           <...>-31701 [004] d...  6770.933436: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x9
        setfattr-31701 [004] d...  6770.933441: r_zfs_znode_alloc_0: 
(zfs_mknode+0x8ae/0xe10 [zfs] <- zfs_znode_alloc) zpp=0xffff8802111a8448

Remove file:
- Nothing more than zfs_zget() (i.e., "load to memory/get znode and inode for 
object number")
  on the file and xattr dir.
- No node removal yet (zfs_rmnode), nor its descendent functions.

$ rm /zfs/file
           <...>-5240  [000] d...  6796.826938: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x7
           <...>-5240  [000] d...  6796.826967: r_zfs_zget_0: 
(zfs_dirent_lock+0x56c/0x6c0 [zfs] <- zfs_zget)

              rm-5240  [000] d...  6796.827023: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x8
              rm-5240  [000] d...  6796.827030: r_zfs_zget_0: 
(zfs_remove+0x22b/0x4c0 [zfs] <- zfs_zget)

When dropping caches (e.g., inode LRU list)
- In one disposal list (i.e., call to dispose_list())
  - Evict/Dispose the xattr node (obj 0x9)
  - This iput()s its parent node (obj 0x8, the xattr dir node)
    thus dropping its last reference (allows it to be evicted)
    with zfs_iput_async().
- In another disposal list, before ZFS's async iput() task runs.
  - Evict/Dispose the xattr dir node (obj 0x8)
  - This iput()s its parent node (obj 0x7, the file node)
    thus dropping its last reference (allows it to be evicted).
- Then ZFS's async iput() task runs.
  - Evict/Dispose the file node (obj 0x7)
  - This triggers the node removal function, zfs_rmnode().
  - This zfs_zget()s the xattr dir node (obj 0x8), bringing it back,
    note it gets another znode pointer value zpp=0xffff8802115e0000
    and drops the reference to it with zfs_iput_async(),
    thus it's back again, and can/needs to be evicted/disposed again.

$ echo 2 | sudo tee /proc/sys/vm/drop_caches                              
...
             tee-11196 [002] d...  6823.459967: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-11196 [002] d...  6823.459975: p_zpl_evict_inode_0: 
(zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8802111a8660
             tee-11196 [002] d...  6823.459980: p_zfs_inactive_0: 
(zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8802111a8660
             tee-11196 [002] d...  6823.459982: p_zfs_zinactive_0: 
(zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8802111a8448 obj=0x9
             tee-11196 [002] d...  6823.459994: p_zfs_iput_async_0: 
(zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff8802111a8218 obj=0x8
             tee-11196 [002] d...  6823.460178: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-11196 [002] d...  6823.460895: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-11196 [002] d...  6823.461876: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-11196 [002] d...  6823.463307: p_dispose_list_0: 
(dispose_list+0x0/0x50)

             tee-11196 [002] d...  6823.463412: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-11196 [002] d...  6823.463414: p_zpl_evict_inode_0: 
(zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8802111a8218
             tee-11196 [002] d...  6823.463415: p_zfs_inactive_0: 
(zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8802111a8218
             tee-11196 [002] d...  6823.463416: p_zfs_zinactive_0: 
(zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8802111a8000 obj=0x8
             tee-11196 [002] d...  6823.463420: p_zfs_iput_async_0: 
(zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff8800a65f8218 obj=0x7
 
           <...>-30411 [007] d...  6823.463530: p_zpl_evict_inode_0: 
(zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8800a65f8218
          z_iput-30411 [007] d...  6823.463533: p_zfs_inactive_0: 
(zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8800a65f8218
          z_iput-30411 [007] d...  6823.463535: p_zfs_zinactive_0: 
(zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8800a65f8000 obj=0x7
          z_iput-30411 [007] d...  6823.463540: p_zfs_rmnode_0: 
(zfs_rmnode+0x0/0x350 [zfs]) znode=0xffff8800a65f8000
          z_iput-30411 [007] d...  6823.463598: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x8
          z_iput-30411 [007] d...  6823.463613: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x8
          z_iput-30411 [007] d...  6823.463634: r_zfs_znode_alloc_0: 
(zfs_zget+0x1ae/0x230 [zfs] <- zfs_znode_alloc) zpp=0xffff8802115e0000
          z_iput-30411 [007] d...  6823.463636: r_zfs_zget_0: 
(zfs_rmnode+0x249/0x350 [zfs] <- zfs_zget)
          z_iput-30411 [007] d...  6823.463714: p_zfs_iput_async_0: 
(zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff8802115e0218 obj=0x8

When dropping the caches again,
- In one disposal list 
  - Evict/Dispose the xattr dir node (obj=0x8)
  - This triggers the node removal function zfs_rmnode(),
    and its descendent function zfs_purgedir() for xattr dir nodes.
  - zfs_purgedir() calls zfs_zget() on the child/xattr node (obj=0x9),
    bringing it to memory, note it has another znode pointer 
zpp=0xffff880234a58000 ).
- In another disposal list
  - Evict/Dispose the (brought back) xattr node.

$ echo 2 | sudo tee /proc/sys/vm/drop_caches 
...
             tee-890   [001] d...  6921.482840: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-890   [001] dN..  6921.482847: p_zpl_evict_inode_0: 
(zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8802115e0218
             tee-890   [001] d...  6921.483049: p_zfs_inactive_0: 
(zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8802115e0218
             tee-890   [001] d...  6921.483140: p_zfs_zinactive_0: 
(zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8802115e0000 obj=0x8
             tee-890   [001] d...  6921.483243: p_zfs_rmnode_0: 
(zfs_rmnode+0x0/0x350 [zfs]) znode=0xffff8802115e0000
             tee-890   [001] dN..  6921.483255: p_zfs_purgedir_0: 
(zfs_purgedir+0x0/0x210 [zfs]) znode=0xffff8802115e0000
             tee-890   [001] d...  6921.483491: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x9
             tee-890   [001] d...  6921.483595: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x9
             tee-890   [001] d...  6921.483714: r_zfs_znode_alloc_0: 
(zfs_zget+0x1ae/0x230 [zfs] <- zfs_znode_alloc) zpp=0xffff880234a58000
             tee-890   [001] d...  6921.484133: r_zfs_zget_0: 
(zfs_purgedir+0xb4/0x210 [zfs] <- zfs_zget)
             tee-890   [001] d...  6921.484394: p_zfs_iput_async_0: 
(zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff880234a58218 obj=0x9
             tee-890   [001] d...  6921.484521: r_zfs_purgedir_0: 
(zfs_rmnode+0x260/0x350 [zfs] <- zfs_purgedir)
             tee-890   [001] d...  6921.484973: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-890   [000] d...  6921.490662: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-890   [000] d...  6921.490734: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-890   [000] d...  6921.490791: p_dispose_list_0: 
(dispose_list+0x0/0x50)
             tee-890   [000] d...  6921.490794: p_zpl_evict_inode_0: 
(zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff880234a58218
             tee-890   [000] d...  6921.490796: p_zfs_inactive_0: 
(zfs_inactive+0x0/0x270 [zfs]) inode=0xffff880234a58218
             tee-890   [000] d...  6921.490798: p_zfs_zinactive_0: 
(zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff880234a58000 obj=0x9
             tee-890   [000] d...  6921.490802: p_zfs_rmnode_0: 
(zfs_rmnode+0x0/0x350 [zfs]) znode=0xffff880234a58000
             tee-890   [000] d...  6921.493071: p_dispose_list_0: 
(dispose_list+0x0/0x50)

The problem would happen if, for some reason,
the zfs_purgedir() call for the xattr dir node (obj=0x8) calls zfs_zget() on 
the xattr node (obj=0x9)
while the latter has not yet been evicted/disposed (so that dmu_buf_get_user() 
still returns non-NULL)
but is positioned later on this disposal list / marked for disposal (so that 
igrab() returns NULL due to inode.i_state).

These two conditions create an infinite loop in zfs_zget(), which is a 
deadlock, because:
1) it would only finish if dmu_buf_get_user() returns NULL,
   which only occurs if the _xattr inode_ goes through the disposal path
   (evict() -> zpl_evict_inode() -> zfs_inactive() -> zfs_zinactive() -> 
zfs_znode_dmu_fini() -> sa_handle_destroy() -> dmu_buf_remove_user())
2) and that is blocked waiting on the (looping) disposal of the _xattr dir 
inode_
   (because the xattr inode is later in the disposal list),
   which is waiting on the disposal of the _xattr inode_.



** Bug watch added: Github Issue Tracker for ZFS #4816
   https://github.com/zfsonlinux/zfs/issues/4816

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1839521

Title:
  Xenial: ZFS deadlock in shrinker path with xattrs

Status in zfs-linux package in Ubuntu:
  Invalid
Status in zfs-linux source package in Xenial:
  In Progress
Status in zfs-linux source package in Bionic:
  Invalid
Status in zfs-linux source package in Disco:
  Invalid
Status in zfs-linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Xenial's ZFS can deadlock in the memory shrinker path
     after removing files with extended attributes (xattr).

   * Extended attributes are enabled by default, but are
     _not_ used by default, which reduces the likelyhood.

   * It's very difficult/rare to reproduce this problem,
     due to file/xattr/remove/shrinker/lru order/timing
     circumstances required. (weeks for a reporter user)
     but a synthetic test-case has been found for tests.

  [Test Case]

   * A synthetic reproducer is available for this LP,
     with a few steps to touch/setfattr/rm/drop_caches
     plus a kernel module to massage the disposal list.

   * In the original ZFS module:
     the xattr dir inode is not purged immediately on
     file removal, but possibly purged _two_ shrinker
     invocations later.  This allows for other thread
     started before file remove to call zfs_zget() on
     the xattr child inode and iput() it, so it makes
     to the same disposal list as the xattr dir inode.

   * In the modified ZFS module:
     the xattr dir inode is purged immediately on file
     removal not possibly later on shrinker invocation,
     so the problem window above doesn't exist anymore.

  [Regression Potential]

   * Low. The patches are confined to extended attributes
     in ZFS, specifically node removal/purge, and another
     change how an xattr child inode tracks its xattr dir
     (parent) inode, so that it can be purged immediately
     on removal.

   * The ZFS test-suite has been run on original/modified
     zfs-dkms package/kernel modules, with no regressions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1839521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to