On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones <da...@codemonkey.org.uk> wrote: > > The stacks show nearly all of them are stuck in sync_inodes_sb
That's just wb_wait_for_completion(), and it means that some IO isn't completing. There's also a lot of processes waiting for inode_lock(), and a few waiting for mnt_want_write() Ignoring those, we have > [<ffffffffa009554f>] btrfs_wait_ordered_roots+0x3f/0x200 [btrfs] > [<ffffffffa00470d1>] btrfs_sync_fs+0x31/0xc0 [btrfs] > [<ffffffff811fbd4e>] sync_filesystem+0x6e/0xa0 > [<ffffffff811fbebc>] SyS_syncfs+0x3c/0x70 > [<ffffffff8100255c>] do_syscall_64+0x5c/0x170 > [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25 > [<ffffffffffffffff>] 0xffffffffffffffff Don't know this one. There's a couple of them. Could there be some ABBA deadlock on the ordered roots waiting? > [<ffffffff8131ae87>] call_rwsem_down_write_failed+0x17/0x30 > [<ffffffffa008ed32>] btrfs_fallocate+0xb2/0xfd0 [btrfs] > [<ffffffff811c6c3e>] vfs_fallocate+0x13e/0x220 > [<ffffffff811c79f3>] SyS_fallocate+0x43/0x80 > [<ffffffff8100255c>] do_syscall_64+0x5c/0x170 > [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25 > [<ffffffffffffffff>] 0xffffffffffffffff This one is also inode_lock(), and is interesting only because it's fallocate(), which has shown up so many times before.. But there are other threads blocked on do_truncate, or btrfs_file_write_iter instead, or on lseek, so this is not different for any other reason. > [<ffffffff81149fbf>] wait_on_page_bit+0xaf/0xc0 > [<ffffffff8114a121>] __filemap_fdatawait_range+0x151/0x170 > [<ffffffff8114d79c>] filemap_fdatawait_keep_errors+0x1c/0x20 > [<ffffffff811f59b3>] sync_inodes_sb+0x273/0x300 > [<ffffffff811fbd37>] sync_filesystem+0x57/0xa0 > [<ffffffff811fbebc>] SyS_syncfs+0x3c/0x70 > [<ffffffff8100255c>] do_syscall_64+0x5c/0x170 > [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25 > [<ffffffffffffffff>] 0xffffffffffffffff This is actually waiting on the page. Possibly this is the IO that is never completing, and keeps the inode lock. > [<ffffffffa009576b>] btrfs_start_ordered_extent+0x5b/0xb0 [btrfs] > [<ffffffffa008bf5d>] lock_and_cleanup_extent_if_need+0x22d/0x290 [btrfs] > [<ffffffffa008d1e8>] __btrfs_buffered_write+0x1b8/0x6e0 [btrfs] > [<ffffffffa0090e60>] btrfs_file_write_iter+0x170/0x550 [btrfs] > [<ffffffff811c97d8>] do_iter_readv_writev+0xa8/0x100 > [<ffffffff811ca162>] do_readv_writev+0x172/0x210 > [<ffffffff811ca42a>] vfs_writev+0x3a/0x50 > [<ffffffff811ca5c0>] do_pwritev+0xb0/0xd0 > [<ffffffff811cb57c>] SyS_pwritev+0xc/0x10 > [<ffffffff8100255c>] do_syscall_64+0x5c/0x170 > [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25 Hmm. This is the one that *started* the ordered extents (as opposed to the ones waiting for it) I dunno. There might be a lost IO. More likely it's the same corruption that causes it, it just didn't result in an oops this time. Linus