> Module Name: src > Committed By: christos > Date: Thu Aug 24 19:51:24 UTC 2023 > > Modified Files: > src/sys/compat/linux/common: linux_inotify.c > > Log Message: > fix a locking bug (Theodore Preduta) > > if (needs_lock) > vn_lock(vp, LK_SHARED | LK_RETRY); > + else > + /* > + * XXX We need to temprarily drop v_interlock because > + * it may be temporarily acquired by biowait(). > + */ > + mutex_exit(vp->v_interlock); > + KASSERT(!mutex_owned(vp->v_interlock)); > error = VOP_READDIR(vp, &uio, fp->f_cred, &eofflag, NULL, NULL); > if (needs_lock) > VOP_UNLOCK(vp); > + else > + mutex_enter(vp->v_interlock);
This looks questionable. 1. Why is v_interlock held in the first place? 2. Why is it safe to release v_interlock here, if the caller holds it? 3. What is the lock order for all the locks involved? I see that inotify_readdir via get_inotify_dir_entries has two call sites, leading to the following lock orders: - linux_sys_inotify_add_watch -> mutex_enter(&ifd->ifd_lock) -> get_inotify_dir_entries(..., needs_lock=true) -> inotify_readdir(..., needs_lock=true) -> vn_lock(vp) -> VOP_READDIR(vp) - handle_write -> mutex_enter(&ifd->ifd_lock) -> get_inotify_dir_entries(..., needs_lock=false) -> inotify_readdir(..., needs_lock=true) -> mutex_exit(vp->v_interlock) -> VOP_READDIR(vp) Since VOP_READDIR requires vp to be locked, I can infer that the handle_write caller must already hold vp locked. But that means that we have ifd_lock -> vnode lock in one path, and vnode lock -> ifd_lock in another path, which is forbidden (unless there's some reason we can prove these paths never happen concurrently). I'm also suspicious of logic like this: mutex_enter(&fp->f_lock); uio.uio_offset = fp->f_offset; mutex_exit(&fp->f_lock); ... mutex_enter(&fp->f_lock); fp->f_offset = uio.uio_offset; mutex_exit(&fp->f_lock); If fp->f_offset can change concurrently while f_lock is released, this looks like a problem. But if it can't, why do we need the lock at all? I suspect this really does need a lock over the whole transaction (maybe fp->f_lock, maybe something else), which is not released while I/O is happening -- possibly not with mutex(9) but something interruptible instead.