Booting the very latest -tip on my test machine gets me the below splat.
Dave, TJ, FYI, lockdep grew annotations for completions; it remembers which locks were taken before we complete() and checks none of those are held while we wait_for_completion(). That is, something like: mutex_lock(&A); mutex_unlock(&A); complete(&C); mutex_lock(&A); wait_for_completion(&C); Is now considered a problem. Note that in order for C to link to A it needs to have observed the complete() thread acquiring it after a wait_for_completion(), something like: wait_for_completion(&C) mutex_lock(&A); mutex_unlock(&A); complete(&C); That is, only locks observed taken between C's wait_for_completion() and it's complete() are considered. Now given the above observance rule and the fact that the below report is from the complete, the thing that happened appears to be: lockdep_map_acquire(&work->lockdep_map) down_write(&A) down_write(&A) wait_for_completion(&C) lockdep_map_acquire(&work_lockdep_map); complete(&C) Which lockdep then puked over because both sides saw the same work class. Byungchul; should we not exclude the work class itself, it appears to me the workqueue code is explicitly parallel, or am I confused again? [ 38.882358] ====================================================== [ 38.889256] WARNING: possible circular locking dependency detected [ 38.896155] 4.13.0-rc6-00609-g645737361ab6-dirty #3 Not tainted [ 38.902752] ------------------------------------------------------ [ 38.909650] kworker/0:0/3 is trying to acquire lock: [ 38.915189] ((&ioend->io_work)){+.+.}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0 [ 38.924715] [ 38.924715] but now in release context of a crosslock acquired at the following: [ 38.934618] ((complete)&bp->b_iowait){+.+.}, at: [<ffffffff8151ce42>] xfs_buf_submit_wait+0xb2/0x590 [ 38.944919] [ 38.944919] which lock already depends on the new lock. [ 38.944919] [ 38.954046] [ 38.954046] the existing dependency chain (in reverse order) is: [ 38.962397] [ 38.962397] -> #2 ((complete)&bp->b_iowait){+.+.}: [ 38.969401] __lock_acquire+0x10a1/0x1100 [ 38.974460] lock_acquire+0xea/0x1f0 [ 38.979036] wait_for_completion+0x3b/0x130 [ 38.984285] xfs_buf_submit_wait+0xb2/0x590 [ 38.989535] _xfs_buf_read+0x23/0x30 [ 38.994108] xfs_buf_read_map+0x14f/0x320 [ 38.999169] xfs_trans_read_buf_map+0x170/0x5d0 [ 39.004812] xfs_read_agf+0x97/0x1d0 [ 39.009386] xfs_alloc_read_agf+0x60/0x240 [ 39.014541] xfs_alloc_fix_freelist+0x32a/0x3d0 [ 39.020180] xfs_free_extent_fix_freelist+0x6b/0xa0 [ 39.026206] xfs_free_extent+0x48/0x120 [ 39.031068] xfs_trans_free_extent+0x57/0x200 [ 39.036512] xfs_extent_free_finish_item+0x26/0x40 [ 39.042442] xfs_defer_finish+0x174/0x770 [ 39.047494] xfs_itruncate_extents+0x126/0x470 [ 39.053035] xfs_setattr_size+0x2a1/0x310 [ 39.058093] xfs_vn_setattr_size+0x57/0x160 [ 39.063342] xfs_vn_setattr+0x50/0x70 [ 39.068015] notify_change+0x2ee/0x420 [ 39.072785] do_truncate+0x5d/0x90 [ 39.077165] path_openat+0xab2/0xc00 [ 39.081737] do_filp_open+0x8a/0xf0 [ 39.086213] do_sys_open+0x123/0x200 [ 39.090787] SyS_open+0x1e/0x20 [ 39.094876] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 39.100611] [ 39.100611] -> #1 (&xfs_nondir_ilock_class){++++}: [ 39.107612] __lock_acquire+0x10a1/0x1100 [ 39.112660] lock_acquire+0xea/0x1f0 [ 39.117224] down_write_nested+0x26/0x60 [ 39.122184] xfs_ilock+0x166/0x220 [ 39.126563] __xfs_setfilesize+0x30/0x200 [ 39.131612] xfs_setfilesize_ioend+0x7f/0xb0 [ 39.136958] xfs_end_io+0x49/0xf0 [ 39.141240] process_one_work+0x273/0x6b0 [ 39.146288] worker_thread+0x48/0x3f0 [ 39.150960] kthread+0x147/0x180 [ 39.155146] ret_from_fork+0x2a/0x40 [ 39.159708] [ 39.159708] -> #0 ((&ioend->io_work)){+.+.}: [ 39.166126] process_one_work+0x244/0x6b0 [ 39.171184] worker_thread+0x48/0x3f0 [ 39.175845] kthread+0x147/0x180 [ 39.180020] ret_from_fork+0x2a/0x40 [ 39.184593] 0xffffffffffffffff [ 39.188677] [ 39.188677] other info that might help us debug this: [ 39.188677] [ 39.197611] Chain exists of: [ 39.197611] (&ioend->io_work) --> &xfs_nondir_ilock_class --> (complete)&bp->b_iowait [ 39.197611] [ 39.211399] Possible unsafe locking scenario by crosslock: [ 39.211399] [ 39.219268] CPU0 CPU1 [ 39.224321] ---- ---- [ 39.229377] lock(&xfs_nondir_ilock_class); [ 39.234135] lock((complete)&bp->b_iowait); [ 39.238902] lock((&ioend->io_work)); [ 39.245889] unlock((complete)&bp->b_iowait); [ 39.253660] [ 39.253660] *** DEADLOCK *** [ 39.253660] [ 39.260267] 3 locks held by kworker/0:0/3: [ 39.264838] #0: ("xfs-buf/%s"mp->m_fsname){++++}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0 [ 39.275523] #1: ((&bp->b_ioend_work)){+.+.}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0 [ 39.285721] #2: (&x->wait#17){....}, at: [<ffffffff8113daed>] complete+0x1d/0x60 [ 39.294176] [ 39.294176] stack backtrace: [ 39.299039] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.13.0-rc6-00609-g645737361ab6-dirty #3 [ 39.308749] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 39.320209] Workqueue: xfs-buf/sdb1 xfs_buf_ioend_work [ 39.325945] Call Trace: [ 39.328677] dump_stack+0x86/0xcf [ 39.332377] print_circular_bug+0x203/0x2f0 [ 39.337048] check_prev_add+0x3be/0x700 [ 39.341331] ? print_bfs_bug+0x40/0x40 [ 39.345518] lock_commit_crosslock+0x40d/0x5a0 [ 39.350479] ? lock_commit_crosslock+0x40d/0x5a0 [ 39.355633] ? xfs_buf_ioend_work+0x15/0x20 [ 39.360304] complete+0x29/0x60 [ 39.363810] xfs_buf_ioend+0x15e/0x470 [ 39.367987] xfs_buf_ioend_work+0x15/0x20 [ 39.372463] process_one_work+0x273/0x6b0 [ 39.376931] worker_thread+0x48/0x3f0 [ 39.381021] kthread+0x147/0x180 [ 39.384624] ? process_one_work+0x6b0/0x6b0 [ 39.389294] ? kthread_create_on_node+0x40/0x40 [ 39.394351] ret_from_fork+0x2a/0x40