(forwarding to current-users because tech-misc appears to be inactive) ----- Forwarded message from Alexander Nasonov <[email protected]> -----
Date: Sun, 5 Sep 2021 22:16:48 +0100 From: Alexander Nasonov <[email protected]> To: [email protected] Subject: zpool import skips wedges due to a race condition zfs import reliably fails to detect a pool on my server. The pool lives on cgd1: # dkctl cgd1 listwedges /dev/rcgd1: 1 wedge: dk24: zcgdroot, 6688832954 blocks at 34, type: zfs When I run zfs import, it launches 32 threads and opens 32 disks in parallel, including cgd1 and dk24. But it can't open dk24 while cgd1 is still open (it fails with EBUSY). I fixed it in the attatched patch by running only one thread. It's not the best approach but I'm not sure how to fix it properly. Alex Index: libzfs_import.c =================================================================== RCS file: /cvsroot/src/external/cddl/osnet/dist/lib/libzfs/common/libzfs_import.c,v retrieving revision 1.7 diff -p -u -u -r1.7 libzfs_import.c --- libzfs_import.c 28 Aug 2021 10:47:45 -0000 1.7 +++ libzfs_import.c 5 Sep 2021 20:50:35 -0000 @@ -1326,9 +1326,11 @@ skipdir: * double the number of processors; we hold a lot of * locks in the kernel, so going beyond this doesn't * buy us much. + * XXX It's not a very smart idea to open all disks in + * parallel because wedges on NetBSD can't be open while + * a parent disk is open. For now, only run one thread. */ - t = tpool_create(1, 2 * sysconf(_SC_NPROCESSORS_ONLN), - 0, NULL); + t = tpool_create(1, 1, 0, NULL); for (slice = avl_first(&slice_cache); slice; (slice = avl_walk(&slice_cache, slice, AVL_AFTER))) ----- End forwarded message -----
