On 5/27/26 12:11, Martin Wilck wrote:
On Wed, 2026-05-27 at 11:40 +0200, Hannes Reinecke wrote:
On 5/22/26 18:44, Martin Wilck wrote:
Use shared_ptr to track the runner_context refcount.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
---
   libmpathutil/runner.c | 30 +++++++++---------------------
   1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
index 8c6d6b9..56abd03 100644
--- a/libmpathutil/runner.c
+++ b/libmpathutil/runner.c
@@ -30,7 +30,6 @@ const char *runner_state_name(int state)
   }
  struct runner_context {
-       int refcount;
        int status;
        struct timespec deadline;
        pthread_t thr;
@@ -39,17 +38,6 @@ struct runner_context {
        char __attribute__((aligned(sizeof(void *)))) data[];
   };
-static void release_context(struct runner_context *rctx)
-{
-       int n;
-
-       n = uatomic_sub_return(&rctx->refcount, 1);
-       assert(n >= 0);
-
-       if (n == 0)
-               free(rctx);
-}
-
   static void cleanup_context(struct runner_context **prctx)
   {
        struct runner_context *rctx = *prctx;
@@ -65,7 +53,7 @@ static void cleanup_context(struct runner_context
**prctx)
                        "%s: runner %p finished in state '%s'",
__func__, rctx,
                        runner_state_name(st));
        }
-       release_context(rctx);
+       put_shared_ptr(rctx);
   }
  static void *runner_thread(void *arg)
@@ -147,7 +135,7 @@ repeat:
   void release_runner(struct runner_context *rctx)
   {
        cancel_runner(rctx);
-       release_context(rctx);
+       put_shared_ptr(rctx);
   }
  int check_runner(struct runner_context *rctx, void *data,
unsigned int size)
@@ -195,17 +183,17 @@ struct runner_context *get_runner(runner_func
func, void *data,
                return NULL;
        }
- rctx = malloc(sizeof(*rctx) + size);
+       rctx = alloc_shared_ptr(sizeof(*rctx) + size, NULL);
        if (!rctx)
                return NULL;
   rctx->func = func;
        /*
-        * We have to set the refcount to 2 here. The runner
thread may be
-        * cancelled before it even had the chance to increase the
refcount,
-        * which could result in a use-after-free in
cleanup_context().
+        * Take an additional reference here. The runner thread
may be
+        * cancelled before it even had the chance to take a
reference, which
+        * could result in a use-after-free in cleanup_context().
         */
-       uatomic_set(&rctx->refcount, 2);
+       get_shared_ptr(rctx);
        uatomic_set(&rctx->status, RUNNER_IDLE);
        memcpy(rctx->data, data, size);
@@ -222,8 +210,8 @@ struct runner_context *get_runner(runner_func
func, void *data,
   if (rc) {
                condlog(1, "%s: pthread_create(): %s", __func__,
strerror(rc));
-               uatomic_dec(&rctx->refcount);
-               release_context(rctx);
+               put_shared_ptr(rctx);
+               put_shared_ptr(rctx);
                return NULL;
        }
        return rctx;

Hmm. While I fully get the idea why one would want to use refcounted
objects, there really is only one snag: this is multipath.
We really, _really_, should make sure the daemon can execute even if
the root filesystem is not available.
So if we start dropping the reference for the checker (eg if the last
path drops and the checker is no longer used), we might end up
freeing
the memory (and dropping the modules altogether).

I am pretty sure that we handle this correctly. The current patch set
doesn't change anything in this respect, except translating the hard-
coded refcounting into the shared_ptr framework.

But if the path comes back we need to reinstate the checker, which
not
only requires additional memory (which might need a recursion into
the
filesystem to get free pages) but we also might need to read the
checker
module from disk (again). So plenty of opportunity to deaslock
waithing
for the disk be become readable.

In checker_get(), we call add_checker_class() if the class is not yet
loaded, which will take one ref on the class, and then get_shared_ptr()
for every path (including the one for which add_checker_class() had
been called), which means we have (number of paths + 1) references on
the class, and will only drop the last ref when multipathd calls
cleanup_checkers() during exit.

Therefore I don't think it's possible that we unload the shared library
prematurely, and have to reload it later.

Hmm.

Really hmmmmm.

If the checkers are loaded during startup, and unloaded on shutdown,
there really is no point in refcounting them, is there?
Wouldn't global pointers to each check sufficient here, avoiding
the need for refcounting altogether?

Cheers,

Hannes
--
Dr. Hannes Reinecke                  Kernel Storage Architect
[email protected]                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Reply via email to