On Wed, May 01, 2024 at 04:12:14PM -0700, Noah Misch wrote: > While writing an injection point test, I encountered a variant of the race > condition that f4083c4 fixed. It had three sessions and this sequence of > events: > > s1: local-attach to POINT > s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback() > s3: detach POINT, deleting the InjectionPointCondition record > s2: wake up and run POINT as though it had been non-local
Fun. One thing I would ask is why it makes sense to be able to detach a local point from a different session than the one who defined it as local. Shouldn't the operation of s3 be restricted rather than authorized as a safety measure, instead? > On Sat, Mar 16, 2024 at 08:40:21AM +0900, Michael Paquier wrote: >> On Fri, Mar 15, 2024 at 11:23:31AM +0200, Heikki Linnakangas wrote: >>> Wrt. the spinlock and shared memory handling, I think this would be simpler >>> if you could pass some payload in the InjectionPointAttach() call, which >>> would be passed back to the callback function: >>> >>> In this case, the payload would be the "slot index" in shared memory. >>> >>> Or perhaps always allocate, say, 1024 bytes of working area for every >>> attached injection point that the test module can use any way it wants. Like >>> for storing extra conditions, or for the wakeup counter stuff in >>> injection_wait(). A fixed size working area is a little crude, but would be >>> very handy in practice. > > That would be one good way to solve it. (Storing a slot index has the same > race condition, but it fixes the race to store a struct containing the PID.) > > The best alternative I see is to keep an InjectionPointCondition forever after > creating it. Give it a "bool valid" field that we set on detach. I don't see > a major reason to prefer one of these over the other. One puts a negligible > amount of memory pressure on the main segment, but it simplifies the module > code. I lean toward the "1024 bytes of working area" idea. Other ideas or > opinions? If more state data is needed, the fixed area injection_point.c would be better. Still, I am not sure that this is required here, either. > Separately, injection_points_cleanup() breaks the rules by calling > InjectionPointDetach() while holding a spinlock. The latter has an > elog(ERROR), and reaching that elog(ERROR) leaves a stuck spinlock. I haven't > given as much thought to solutions for this one. Indeed. That's a brain fade. This one could be fixed by collecting the point names when cleaning up the conditions and detach after releasing the spinlock. This opens a race condition between the moment when the spinlock is released and the detach, where another backend could come in and detach a point before the shmem_exit callback has the time to do its cleanup, even if detach() is restricted for local points. So we could do the callback cleanup in three steps in the shmem exit callback: - Collect the names of the points to detach, while holding the spinlock. - Do the Detach. - Take again the spinlock, clean up the conditions. Please see the attached. -- Michael
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c index ace386da50..caf58ef9db 100644 --- a/src/test/modules/injection_points/injection_points.c +++ b/src/test/modules/injection_points/injection_points.c @@ -159,10 +159,39 @@ injection_point_allowed(const char *name) static void injection_points_cleanup(int code, Datum arg) { + char names[INJ_MAX_CONDITION][INJ_NAME_MAXLEN] = {0}; + int count = 0; + /* Leave if nothing is tracked locally */ if (!injection_point_local) return; + /* + * This is done in three steps: detect the points to detach, + * detach them and release their conditions. + */ + SpinLockAcquire(&inj_state->lock); + for (int i = 0; i < INJ_MAX_CONDITION; i++) + { + InjectionPointCondition *condition = &inj_state->conditions[i]; + + if (condition->name[0] == '\0') + continue; + + if (condition->pid != MyProcPid) + continue; + + /* Extract the point name to detach */ + strlcpy(names[count], condition->name, INJ_NAME_MAXLEN); + count++; + } + SpinLockRelease(&inj_state->lock); + + /* Detach, without holding the spinlock */ + for (int i = 0; i < count; i++) + InjectionPointDetach(names[i]); + + /* Clear all the conditions */ SpinLockAcquire(&inj_state->lock); for (int i = 0; i < INJ_MAX_CONDITION; i++) { @@ -174,8 +203,6 @@ injection_points_cleanup(int code, Datum arg) if (condition->pid != MyProcPid) continue; - /* Detach the injection point and unregister condition */ - InjectionPointDetach(condition->name); condition->name[0] = '\0'; condition->pid = 0; } @@ -403,6 +430,10 @@ injection_points_detach(PG_FUNCTION_ARGS) { char *name = text_to_cstring(PG_GETARG_TEXT_PP(0)); + if (!injection_point_allowed(name)) + elog(ERROR, "cannot detach injection point \"%s\" not allowed to run", + name); + InjectionPointDetach(name); if (inj_state == NULL)
signature.asc
Description: PGP signature