On Sun, Jun 07, 2020 at 02:30:23PM -0500, jassisinghb...@gmail.com wrote:
> From: Jassi Brar <jaswinder.si...@linaro.org>
>
> Currently scmi_do_xfer() submits a message to mailbox api and waits
> for an apparently very short time. This works if there are not many
> messages in the queue already. However, if many clients share a
> channel and/or each client submits many messages in a row, the

The recommendation in such scenarios is to use multiple channel.

> timeout value becomes too short and returns error even if the mailbox
> is working fine according to the load. The timeout occurs when the
> message is still in the api/queue awaiting its turn to ride the bus.
>
>  Fix this by increasing the timeout value enough (500ms?) so that it
> fails only if there is an actual problem in the transmission (like a
> lockup or crash).
>
> [If we want to capture a situation when the remote didn't
> respond within expected latency, then the timeout should not
> start here, but from tx_prepare callback ... just before the
> message physically gets on the channel]
>

The bottle neck may not be in the remote. It may be mailbox serialising
the requests even when it can parallelise.

> Signed-off-by: Jassi Brar <jaswinder.si...@linaro.org>
> ---
>  drivers/firmware/arm_scmi/driver.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/arm_scmi/driver.c 
> b/drivers/firmware/arm_scmi/driver.c
> index dbec767222e9..46ddafe7ffc0 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -303,7 +303,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct 
> scmi_xfer *xfer)
>       }
>
>       if (xfer->hdr.poll_completion) {
> -             ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> +             ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * 
> NSEC_PER_USEC);
>

This is unacceptable delay for schedutil fast_switch. So no for this one.

>               spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop));
>
> @@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct 
> scmi_xfer *xfer)
>                       ret = -ETIMEDOUT;
>       } else {
>               /* And we wait for the response. */
> -             timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms);
> +             timeout = msecs_to_jiffies(500);

In general, this hides issues in the remote. We are trying to move towards
tops 1ms for a request and with MBOX_QUEUE at 20, I see 20ms is more that
big enough. We have it set to 30ms now. 500ms is way too large and not
required IMO.

--
Regards,
Sudeep

Reply via email to