On Sun, Jun 07, 2020 at 02:30:23PM -0500, jassisinghb...@gmail.com wrote: > From: Jassi Brar <jaswinder.si...@linaro.org> > > Currently scmi_do_xfer() submits a message to mailbox api and waits > for an apparently very short time. This works if there are not many > messages in the queue already. However, if many clients share a > channel and/or each client submits many messages in a row, the
The recommendation in such scenarios is to use multiple channel. > timeout value becomes too short and returns error even if the mailbox > is working fine according to the load. The timeout occurs when the > message is still in the api/queue awaiting its turn to ride the bus. > > Fix this by increasing the timeout value enough (500ms?) so that it > fails only if there is an actual problem in the transmission (like a > lockup or crash). > > [If we want to capture a situation when the remote didn't > respond within expected latency, then the timeout should not > start here, but from tx_prepare callback ... just before the > message physically gets on the channel] > The bottle neck may not be in the remote. It may be mailbox serialising the requests even when it can parallelise. > Signed-off-by: Jassi Brar <jaswinder.si...@linaro.org> > --- > drivers/firmware/arm_scmi/driver.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/firmware/arm_scmi/driver.c > b/drivers/firmware/arm_scmi/driver.c > index dbec767222e9..46ddafe7ffc0 100644 > --- a/drivers/firmware/arm_scmi/driver.c > +++ b/drivers/firmware/arm_scmi/driver.c > @@ -303,7 +303,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct > scmi_xfer *xfer) > } > > if (xfer->hdr.poll_completion) { > - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS); > + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * > NSEC_PER_USEC); > This is unacceptable delay for schedutil fast_switch. So no for this one. > spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop)); > > @@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct > scmi_xfer *xfer) > ret = -ETIMEDOUT; > } else { > /* And we wait for the response. */ > - timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms); > + timeout = msecs_to_jiffies(500); In general, this hides issues in the remote. We are trying to move towards tops 1ms for a request and with MBOX_QUEUE at 20, I see 20ms is more that big enough. We have it set to 30ms now. 500ms is way too large and not required IMO. -- Regards, Sudeep