Haren Myneni <ha...@linux.ibm.com> writes: > VAS allocate, modify and deallocate HCALLs returns > H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy > delay and expects OS to reissue HCALL after that delay. But using > msleep() will often sleep at least 20 msecs even though the > hypervisor suggests OS reissue these HCALLs after 1 or 10msecs. > > The open and close VAS window functions hold mutex and then issue > these HCALLs. So these operations can take longer than the > necessary when multiple threads issue open or close window APIs > simultaneously, especially might affect the performance in the > case of repeat open/close APIs for each compression request. > On the large machine configuration which allows more simultaneous > open/close windows (Ex: 240 cores provides 4800 VAS credits), the > user can observe hung task traces in dmesg due to mutex contention > around open/close HCAlls.
Is this because the workload queues enough tasks on the mutex to trigger the hung task watchdog? With a threshold of 120 seconds, something on the order of ~6000 tasks each taking 20ms or more to traverse this critical section would cause the problem I think you're describing. Presumably this change improves the situation, but the commit message isn't explicit. Have you measured the "throughput" of window open/close activity before and after? Anything that quantifies the improvement would be welcome. > diff --git a/arch/powerpc/platforms/pseries/vas.c > b/arch/powerpc/platforms/pseries/vas.c > index 71d52a670d95..79ffe8868c04 100644 > --- a/arch/powerpc/platforms/pseries/vas.c > +++ b/arch/powerpc/platforms/pseries/vas.c > @@ -38,7 +38,27 @@ static long hcall_return_busy_check(long rc) > { > /* Check if we are stalled for some time */ > if (H_IS_LONG_BUSY(rc)) { > - msleep(get_longbusy_msecs(rc)); > + unsigned int ms; > + /* > + * Allocate, Modify and Deallocate HCALLs returns > + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC > + * for the long delay. So the sleep time should always > + * be either 1 or 10msecs, but in case if the HCALL > + * returns the long delay > 10 msecs, clamp the sleep > + * time to 10msecs. > + */ > + ms = clamp(get_longbusy_msecs(rc), 1, 10); > + > + /* > + * msleep() will often sleep at least 20 msecs even > + * though the hypervisor suggests that the OS reissue > + * HCALLs after 1 or 10msecs. Also the delay hint from > + * the HCALL is just a suggestion. So OK to pause for > + * less time than the hinted delay. Use usleep_range() > + * to ensure we don't sleep much longer than actually > + * needed. > + */ > + usleep_range(ms * 100, ms * USEC_PER_MSEC); usleep_range(ms * (USEC_PER_MSEC / 10), ms * USEC_PER_MSEC); is probably what reviewers want to see when they ask you to use USEC_PER_MSEC. I.e. both arguments to usleep_range() should be expressed in terms of USEC_PER_MSEC.