Some additional info about the deadlock: crash> bt 16588 PID: 16588 TASK: ffff9ffd7f332b00 CPU: 1 COMMAND: "bcache_allocato" [exception RIP: bch_crc64+57] RIP: ffffffffc093b2c9 RSP: ffffab9585767e28 RFLAGS: 00000286 RAX: f1f51403756de2bd RBX: 0000000000000000 RCX: 0000000000000065 RDX: 0000000000000065 RSI: ffff9ffd63980000 RDI: ffff9ffd63925346 RBP: ffffab9585767e28 R8: ffffffffc093db60 R9: ffffab9585739000 R10: 000000000000007f R11: 000000001ffef001 R12: 0000000000000000 R13: 0000000000000008 R14: ffff9ffd63900000 R15: ffff9ffd683d0000 CS: 0010 SS: 0018 #0 [ffffab9585767e30] bch_prio_write at ffffffffc09325c0 [bcache] #1 [ffffab9585767eb0] bch_allocator_thread at ffffffffc091bdc5 [bcache] #2 [ffffab9585767f08] kthread at ffffffffa80b2481 #3 [ffffab9585767f50] ret_from_fork at ffffffffa8a00205
crash> bt 14658 PID: 14658 TASK: ffff9ffd7a9f0000 CPU: 0 COMMAND: "python3" #0 [ffffab958380bb48] __schedule at ffffffffa89ae441 #1 [ffffab958380bbe8] schedule at ffffffffa89aea7c #2 [ffffab958380bbf8] bch_bucket_alloc at ffffffffc091c370 [bcache] #3 [ffffab958380bc68] __bch_bucket_alloc_set at ffffffffc091c5ce [bcache] #4 [ffffab958380bcb8] bch_bucket_alloc_set at ffffffffc091c66e [bcache] #5 [ffffab958380bcf8] __uuid_write at ffffffffc0931b69 [bcache] #6 [ffffab958380bda0] bch_uuid_write at ffffffffc0931f76 [bcache] #7 [ffffab958380bdc0] __cached_dev_store at ffffffffc0937c08 [bcache] #8 [ffffab958380be20] bch_cached_dev_store at ffffffffc0938309 [bcache] #9 [ffffab958380be50] sysfs_kf_write at ffffffffa830c97c #10 [ffffab958380be60] kernfs_fop_write at ffffffffa830c3e5 #11 [ffffab958380bea0] __vfs_write at ffffffffa827e5bb #12 [ffffab958380beb0] vfs_write at ffffffffa827e781 #13 [ffffab958380bee8] sys_write at ffffffffa827e9fc #14 [ffffab958380bf30] do_syscall_64 at ffffffffa8003b03 #15 [ffffab958380bf50] entry_SYSCALL_64_after_hwframe at ffffffffa8a00081 RIP: 00007faffc7bd154 RSP: 00007ffe307cbc88 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007faffc7bd154 RDX: 0000000000000008 RSI: 00000000011ce7f0 RDI: 0000000000000003 RBP: 00007faffccb86c0 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000000000 R14: 00000000011ce7f0 R15: 0000000000f33e60 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b In this case the task "python3" (pid 14658) gets stuck in a wait that never completes from bch_bucket_alloc(). The task should that should resume "python3" from this wait is "bcache_allocator" (pid 16588), but the resume never happens, because bcache_allocator is stuck in this "retry_invalidate" busy loop: static int bch_allocator_thread(void *arg) { ... retry_invalidate: allocator_wait(ca, ca->set->gc_mark_valid && !ca->invalidate_needs_gc); invalidate_buckets(ca); /* * Now, we write their new gens to disk so we can start writing * new stuff to them: */ allocator_wait(ca, !atomic_read(&ca->set->prio_blocked)); if (CACHE_SYNC(&ca->set->sb)) { /* * This could deadlock if an allocation with a btree * node locked ever blocked - having the btree node * locked would block garbage collection, but here we're * waiting on garbage collection before we invalidate * and free anything. * * But this should be safe since the btree code always * uses btree_check_reserve() before allocating now, and * if it fails it blocks without btree nodes locked. */ if (!fifo_full(&ca->free_inc)) goto retry_invalidate; if (bch_prio_write(ca, false) < 0) { ca->invalidate_needs_gc = 1; wake_up_gc(ca->set); goto retry_invalidate; } } } ... The exact code path is this: bch_prio_write() fails, because it calls bch_bucket_alloc() that fails (out of free buckets), it's waking up the garbage collector (trying to free up some buckets) and it goes back to retry_invalidate, but it's not enough apprently; bch_prio_write() is going to fail over and over again (due to no buckets available), unable to break out of the busy loop => deadlock. Looking better at the code it seems safe to resume the bcache_allocator main loop when bch_prio_write() fails (still keeping the wake_up event to the garbage collector), instead of going back to retry_invalidate. This should give the allocator a better chance to free up some buckets, possibly preventing the "out of buckets" deadlock condition. I'm currently testing a kernel with this change, if I can't trigger any deadlock in the next hour or so, I'll upload a new test kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1796292 Title: Tight timeout for bcache removal causes spurious failures Status in curtin: Fix Released Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Confirmed Status in linux source package in Eoan: Confirmed Bug description: I've had a number of deployment faults where curtin would report Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass- deployment of 30+ nodes. Upon retrying the node would usually deploy fine. Experimentally I've set the timeout ridiculously high, and it seems I'm getting no faults with this. I'm wondering if the timeout for removal is set too tight, or might need to be made configurable. --- curtin/util.py~ 2018-05-18 18:40:48.000000000 +0000 +++ curtin/util.py 2018-10-05 09:40:06.807390367 +0000 @@ -263,7 +263,7 @@ return _subp(*args, **kwargs) -def wait_for_removal(path, retries=[1, 3, 5, 7]): +def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]): if not path: raise ValueError('wait_for_removal: missing path parameter') To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp