Hi Brian,
On Mon, 2026-02-23 at 17:00 -0800, Brian Bunker wrote:
> Fix a race condition in libcheck_check() for both the TUR and ALUA
> async
> checkers where a completed result could be missed, returning
> PATH_PENDING
> unnecessarily.
First of all, you should have cc'd the dm-devel mailing list.
Second, you call this a race condition, but what negative effect does
it actually have? AFAIU this "race condition" just delays returning the
new checker state by a single tick (i.e. 1 second), which is a rather
benign problem for a race, and generally not a big problem for
multipath, where paths can be offline for much longer periods of time.
Finally, before going into any detail, I am highly reluctant to changes
in this area. Ben and I have fought with deadlocks and races in corner
cases for a long time, and we seem to have solved it for good; at least
we haven't seen any more bugs in this area for quite some time. For a
change like this, figuring out possible side effects for all possible
cases by code inspection alone is extremely hard. I'd rather not risk a
regression just to detect a status change 1 second earlier than we
currently do, in rare cases.
>
> Race window
> -----------
>
> In libcheck_thread() / alua_thread():
>
> pthread_mutex_lock(&ct->lock);
> ct->state = state;
> ct->msgid = msgid;
> pthread_cond_signal(&ct->active);
> pthread_mutex_unlock(&ct->lock);
> /* <-- race window here --> */
> uatomic_set(&ct->running, 0);
>
You seem to be looking at old code. The code I am seeing has this:
running = uatomic_xchg(&ct->running, 0);
if (!running)
pause();
Anyway, I think we had very good reasons to set the running state after
releasing the lock. I would need to dig deeper into the history in
order to figure out exactly why we did it this way. It probably has
something to do with the actual fatal races that we observed between
checkers timing out / being cancelled and finishing.
> When libcheck_check() observes ct->running != 0 during this window,
> the
> result is already available under lock, but the previous code could
> fail
> to collect it.
>
> Why the old check was broken
> ----------------------------
>
> The ALUA checker used:
>
> if (ct->state != PATH_PENDING || ct->msgid != MSG_ALUA_RUNNING)
>
> This conflated two different meanings of PATH_PENDING:
> 1. Thread still in progress (no result yet)
> 2. ALUA state is transitioning (valid completed result with
> MSG_ALUA_TRANSITIONING)
True, but what's the actual problem with this? In both cases, the
checker is indeed not yet able to figure out if the path is usable,
which is what CHECKER_PENDING means.
>
> The TUR checker had similar issues since PATH_PENDING with
> MSG_TUR_TRANSITIONING is a valid result.
>
> The fix
> -------
>
> Check ct->msgid alone under lock. The msgid is set to MSG_*_RUNNING
> only
> at async check start and atomically updated with state when complete:
>
> if (ct->msgid != MSG_TUR_RUNNING) {
> tur_status = ct->state;
> c->msgid = ct->msgid;
> ct->thread = 0;
> }
You probably realize that with this change, you basically imply that
the test for ct->running is superfluous. If this was true, we could
actually just skip it and solely rely on ct->msgid. What you're
basically saying is that ct->running is meaningless. But it isn't. It
means that the thread is "dead", which it isn't as long as holds the
mutex.
I'll wait for Ben's opinion, but I am inclined to nack this, sorry.
Martin