On Thu, 2023-11-02 at 18:15 -0400, Benjamin Marzinski wrote:
> This option lets multipath set a scsi disk's max_retries sysfs value.
> Setting this can be helpful for cases where the path checker
> succeeds,
> but IO commands hang and timeout. By default, the SCSI layer will
> retry
> IOs 5 times. Reducing this value will allow multipath to retry the IO
> down another path sooner.
> 
> Signed-off-by: Benjamin Marzinski <[email protected]>

2 nitpicks below. Please explain to me again why we recommend to
activate shaky paths detection with this. What will go wrong if the
user uses max_retries without shaky path detection?

> ---
>  libmultipath/config.c         |  3 +++
>  libmultipath/config.h         |  3 +++
>  libmultipath/dict.c           | 34 ++++++++++++++++++++++++++++
>  libmultipath/discovery.c      | 42
> ++++++++++++++++++++++++++++++++++-
>  libmultipath/propsel.c        | 18 +++++++++++++++
>  libmultipath/propsel.h        |  1 +
>  libmultipath/structs.h        |  7 ++++++
>  multipath/multipath.conf.5.in | 22 ++++++++++++++++++
>  8 files changed, 129 insertions(+), 1 deletion(-)
> 
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> index 044067af..fc438947 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -1152,6 +1152,36 @@ declare_hw_snprint(eh_deadline,
> print_undef_off_zero)
>  declare_pc_handler(eh_deadline, set_undef_off_zero)
>  declare_pc_snprint(eh_deadline, print_undef_off_zero)
>  
> +static int
> +set_max_retries(vector strvec, void *ptr, const char *file, int
> line_nr)
> +{
> +     char * buff;
> +     int *int_ptr = (int *)ptr;
> +
> +     buff = set_value(strvec);
> +     if (!buff)
> +             return 1;
> +
> +     if (strcmp(buff, "off") == 0)
> +             *int_ptr = UOZ_OFF;
> +     else if (strcmp(buff, "0") == 0)
> +             *int_ptr = UOZ_ZERO;

Why not use MAX_RETRIES_OFF and MAX_RETRIES_ZERO here?

> +     else
> +             do_set_int(strvec, int_ptr, 1, 5, file, line_nr,
> buff);
> +
> +     free(buff);
> +     return 0;
> +}
> +

> diff --git a/multipath/multipath.conf.5.in
> b/multipath/multipath.conf.5.in
> index 226d0019..41f3927e 100644
> --- a/multipath/multipath.conf.5.in
> +++ b/multipath/multipath.conf.5.in
> @@ -793,6 +793,22 @@ The default is: \fB<unset>\fR
>  .
>  .
>  .TP
> +.B max_retries
> +Specify the maximum number of times the SCSI layer will retry IO
> commands
> +before returning failure. Setting this can be helpful for cases
> where the path
> +checker succeeds, but IO commands hang and timeout. By default, the
> SCSI layer
> +will retry IOs 5 times. Reducing this value will allow multipath to
> retry the IO
> +down another path sooner. \fBNote:\fR If it is necessary to set this

Maybe you should add "[retry IO commands] for some types of SCSI
errors". Otherwise users might get their hopes too high. IIRC the main
cases that matter here are DID_TRANSPORT_DISRUPTED and perhaps
DID_BUS_BUSY (don't put that in the man page) ;-)



> value, it
> +is also recommended to set up shaky paths detection. See "Shaky
> paths detection"
> +below. Valid values are
> +\fB0\fR through \fB5\fR.
> +.RS
> +.TP
> +The default is: \fB<unset>\fR
> +.RE




Reply via email to