On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> If a multipath device is removed during, or immediately before the
> call
> to check_path(), multipathd can behave incorrectly. A missing
> multpath
> device will cause update_multipath_strings() to fail, setting
> pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will
> cause
> reinstate_path() to be called, which will also fail.  This will
> trigger
> a reload, restoring the recently removed device.
> 
> If update_multipath_strings() fails because there is no multipath
> device, check_path should just quit, since the remove dmevent and
> uevent
> are likely already queued up. Also, I don't see any reason to reload
> the
> multipath device if reinstate fails. This code was added by
> fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
> reinstate
> could fail if the path was disabled.  Looking through the current
> kernel
> code, I can't see any reason why a reinstate would fail, where a
> reload
> would help. If the path was missing from the multipath device,
> update_multipath_strings() would already catch that, and quit
> check_path() early, which make more sense to me than reloading does.

fac68d7 is related to the famous "dm-multipath: Accept failed paths for
multipath maps" patch (e.g. 
https://patchwork.kernel.org/patch/3368381/#7193001), which never made
it upstream. SUSE kernels have shipped this patch for a long time, but
we don't apply it any more in recent kernels.

With this patch, The reinstate_path() failure would occur if multipathd
had created a table with a "disabled" device (one which would be
present in a dm map even though the actual block device didn't exist),
and then tried to reinstate such a path. It sounds unlikely, but it
might be possible if devices are coming and going in quick succession.
In that situation, and with the "accept failed path" patch applied, a
reload makes some sense, because reload (unlike reinstate) would not
fail (at least not for this reason) and would actually add that just-
reinstated path. OTOH, it's not likely that the reload would improve
matters, either. After all, multipathd is just trying to reinstate a
non-existing path. So, I'm fine with skipping the reload.

> 
> Signed-off-by: Benjamin Marzinski <[email protected]>
> ---
>  multipathd/main.c | 36 ++++++++++++++----------------------
>  1 file changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 6b7db2c0..8fb73922 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active)
>  /*
>   * caller must have locked the path list before calling that
> function
>   */
> -static int
> +static void
>  reinstate_path (struct path * pp)
>  {
> -     int ret = 0;
> -
>       if (!pp->mpp)
> -             return 0;
> +             return;
>  
> -     if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) {
> +     if (dm_reinstate_path(pp->mpp->alias, pp->dev_t))
>               condlog(0, "%s: reinstate failed", pp->dev_t);
> -             ret = 1;
> -     } else {
> +     else {
>               condlog(2, "%s: reinstated", pp->dev_t);
>               update_queue_mode_add_path(pp->mpp);
>       }
> -     return ret;
>  }
>  static void
> @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>        * Synchronize with kernel state
>        */
>       if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> +             struct dm_info info;
>               condlog(1, "%s: Could not synchronize with kernel
> state",
>                       pp->dev);
> +             if (pp->mpp && pp->mpp->alias &&
> +                 do_dm_get_info(pp->mpp->alias, &info) == 0)
> +                     /* multipath device missing. Likely removed */
> +                     return 0;
>               pp->dmstate = PSTATE_UNDEF;

It would be more elegant if we could distinguish different failure
modes from update_multipath_strings() directly, without having to call
do_dm_get_info() again.

>       }
>       /* if update_multipath_strings orphaned the path, quit early */
> @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>               /*
>                * reinstate this path
>                */
> -             if (!disable_reinstate && reinstate_path(pp)) {
> -                     condlog(3, "%s: reload map", pp->dev);
> -                     ev_add_path(pp, vecs, 1);
> -                     pp->tick = 1;
> -                     return 0;
> -             }
> +             if (!disable_reinstate)
> +                     reinstate_path(pp);
>               new_path_up = 1;
>  
>               if (oldchkrstate != PATH_UP && oldchkrstate !=
> PATH_GHOST)
> @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
>       else if (newstate == PATH_UP || newstate == PATH_GHOST) {
>               if ((pp->dmstate == PSTATE_FAILED ||
>                   pp->dmstate == PSTATE_UNDEF) &&
> -                 !disable_reinstate) {
> +                 !disable_reinstate)
>                       /* Clear IO errors */
> -                     if (reinstate_path(pp)) {
> -                             condlog(3, "%s: reload map", pp->dev);
> -                             ev_add_path(pp, vecs, 1);
> -                             pp->tick = 1;
> -                             return 0;
> -                     }
> -             } else {
> +                     reinstate_path(pp);
> +             else {
>                       LOG_MSG(4, verbosity, pp);
>                       if (pp->checkint != max_checkint) {
>                               /*

-- 
Dr. Martin Wilck <[email protected]>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer



--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to