Module Name: src Committed By: thorpej Date: Sun Oct 10 18:07:52 UTC 2021
Modified Files: src/sys/kern: kern_event.c kern_exec.c kern_exit.c kern_fork.c src/sys/sys: event.h eventvar.h proc.h Log Message: Changes to make EVFILT_PROC MP-safe: Because the locking protocol around processes is somewhat complex compared to other events that can be posted on kqueues, introduce new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK, rather than just using the generic knote() function. These functions KASSERT() their locking expectations, and deal with other complexities for each situation. knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which requires allocation of a new knote to attach to the child process. We don't want to be allocating memory while holding the parent's p_lock. Furthermore, we also have to attach the tracking note to the child process, which means we have to acquire the child's p_lock. So, to handle all this, we introduce some additional synchronization infrastructure around the 'knote' structure: - Add the ability to mark a knote as being in a state of flux. Knotes in this state are guaranteed not to be detached/deleted, thus allowing a code path drop other locks after putting a knote in this state. - Code paths that wish to detach/delete a knote must first check if the knote is in-flux. If so, they must wait for it to quiesce. Because multiple threads of execution may attempt this concurrently, a mechanism exists for a single LWP to claim the detach responsibility; all other threads simply wait for the knote to disappear before they can make further progress. - When kqueue_scan() encounters an in-flux knote, it simply treats the situation just like encountering another thread's queue marker -- wait for the flux to settle and continue on. (The "in-flux knote" idea was inspired by FreeBSD, but this works differently from their implementation, as the two kqueue implementations have diverged quite a bit.) knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so: - Attempt to put the original tracking knote into a state of flux; if this fails (because the note has a detach pending), we skip all processing (the original process has lost interest, and we simply won the race). - Once the note is in-flux, drop the kq and forking process's locks, and allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach a new NOTE_TRACK to the child process. Notably, we do NOT go through kqueue_register() to do this, but rather do all of the work directly and KASSERT() our assumptions; this allows us to directly control our interaction with locks. All memory allocations here are performed with KM_NOSLEEP, in order to prevent holding the original knote in-flux indefinitely. - Because the NOTE_TRACK use case adds knotes to kqueues through a sort of back-door mechanism, we must serialize with the closing of the destination kqueue's file descriptor, so steal another bit from the kq_count field to notify other threads that a kqueue is on its way out to prevent new knotes from being enqueued while the close path detaches them. In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also fixes a long-standing bug whereby a NOTE_CHILD event could be dropped if the child process exited before the interested process received the NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT event, and would clobber the NOTE_CHILD's 'data' field). Add a bunch of comments to explain what's going on in various critical sections, and sprinkle additional KASSERT()s to validate assumptions in several more locations. To generate a diff of this commit: cvs rdiff -u -r1.128 -r1.129 src/sys/kern/kern_event.c cvs rdiff -u -r1.509 -r1.510 src/sys/kern/kern_exec.c cvs rdiff -u -r1.291 -r1.292 src/sys/kern/kern_exit.c cvs rdiff -u -r1.226 -r1.227 src/sys/kern/kern_fork.c cvs rdiff -u -r1.43 -r1.44 src/sys/sys/event.h cvs rdiff -u -r1.9 -r1.10 src/sys/sys/eventvar.h cvs rdiff -u -r1.368 -r1.369 src/sys/sys/proc.h Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.