On Thu, 2023-11-02 at 18:15 -0400, Benjamin Marzinski wrote:
> This option lets multipath set a scsi disk's max_retries sysfs value.
> Setting this can be helpful for cases where the path checker
> succeeds,
> but IO commands hang and timeout. By default, the SCSI layer will
> retry
> IOs 5 times. Reducing this value will allow multipath to retry the IO
> down another path sooner.
>
> Signed-off-by: Benjamin Marzinski <[email protected]>
2 nitpicks below. Please explain to me again why we recommend to
activate shaky paths detection with this. What will go wrong if the
user uses max_retries without shaky path detection?
> ---
> libmultipath/config.c | 3 +++
> libmultipath/config.h | 3 +++
> libmultipath/dict.c | 34 ++++++++++++++++++++++++++++
> libmultipath/discovery.c | 42
> ++++++++++++++++++++++++++++++++++-
> libmultipath/propsel.c | 18 +++++++++++++++
> libmultipath/propsel.h | 1 +
> libmultipath/structs.h | 7 ++++++
> multipath/multipath.conf.5.in | 22 ++++++++++++++++++
> 8 files changed, 129 insertions(+), 1 deletion(-)
>
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> index 044067af..fc438947 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -1152,6 +1152,36 @@ declare_hw_snprint(eh_deadline,
> print_undef_off_zero)
> declare_pc_handler(eh_deadline, set_undef_off_zero)
> declare_pc_snprint(eh_deadline, print_undef_off_zero)
>
> +static int
> +set_max_retries(vector strvec, void *ptr, const char *file, int
> line_nr)
> +{
> + char * buff;
> + int *int_ptr = (int *)ptr;
> +
> + buff = set_value(strvec);
> + if (!buff)
> + return 1;
> +
> + if (strcmp(buff, "off") == 0)
> + *int_ptr = UOZ_OFF;
> + else if (strcmp(buff, "0") == 0)
> + *int_ptr = UOZ_ZERO;
Why not use MAX_RETRIES_OFF and MAX_RETRIES_ZERO here?
> + else
> + do_set_int(strvec, int_ptr, 1, 5, file, line_nr,
> buff);
> +
> + free(buff);
> + return 0;
> +}
> +
> diff --git a/multipath/multipath.conf.5.in
> b/multipath/multipath.conf.5.in
> index 226d0019..41f3927e 100644
> --- a/multipath/multipath.conf.5.in
> +++ b/multipath/multipath.conf.5.in
> @@ -793,6 +793,22 @@ The default is: \fB<unset>\fR
> .
> .
> .TP
> +.B max_retries
> +Specify the maximum number of times the SCSI layer will retry IO
> commands
> +before returning failure. Setting this can be helpful for cases
> where the path
> +checker succeeds, but IO commands hang and timeout. By default, the
> SCSI layer
> +will retry IOs 5 times. Reducing this value will allow multipath to
> retry the IO
> +down another path sooner. \fBNote:\fR If it is necessary to set this
Maybe you should add "[retry IO commands] for some types of SCSI
errors". Otherwise users might get their hopes too high. IIRC the main
cases that matter here are DID_TRANSPORT_DISRUPTED and perhaps
DID_BUS_BUSY (don't put that in the man page) ;-)
> value, it
> +is also recommended to set up shaky paths detection. See "Shaky
> paths detection"
> +below. Valid values are
> +\fB0\fR through \fB5\fR.
> +.RS
> +.TP
> +The default is: \fB<unset>\fR
> +.RE