On 08/04/2014 12:25 AM, Davidlohr Bueso wrote:
On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
This patch set improves upon the rwsem optimistic spinning patch set
from Davidlohr to enable better performing rwsem and more aggressive
use of optimistic spinning.

By using a microbenchmark running 1 million lock-unlock operations per
thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
3.16-rc7 based kernels, the following table shows the execution times
with 2/10 threads running on different CPUs on the same socket where
load is the number of pause instructions in the critical section:

   lock/r:w ratio # of threads  Load:Execution Time (ms)
   -------------- ------------  ------------------------
   mutex                      2         1:530.7, 5:406.0, 10:472.7
   mutex                     10         1:1848 , 5:2046 , 10:4394

Before patch:
   rwsem/0:1          2         1:339.4, 5:368.9, 10:394.0
   rwsem/1:1          2         1:2915 , 5:2621 , 10:2764
   rwsem/10:1         2         1:891.2, 5:779.2, 10:827.2
   rwsem/0:1         10         1:5618 , 5:5722 , 10:5683
   rwsem/1:1         10         1:14562, 5:14561, 10:14770
   rwsem/10:1        10         1:5914 , 5:5971 , 10:5912

After patch:
   rwsem/0:1         2          1:161.1, 5:244.4, 10:271.4
   rwsem/1:1         2          1:188.8, 5:212.4, 10:312.9
   rwsem/10:1        2          1:168.8, 5:179.5, 10:209.8
   rwsem/0:1        10          1:1306 , 5:1733 , 10:1998
   rwsem/1:1        10          1:1512 , 5:1602 , 10:2093
   rwsem/10:1       10          1:1267 , 5:1458 , 10:2233

% Change:
   rwsem/0:1         2          1:-52.5%, 5:-33.7%, 10:-31.1%
   rwsem/1:1         2          1:-93.5%, 5:-91.9%, 10:-88.7%
   rwsem/10:1        2          1:-81.1%, 5:-77.0%, 10:-74.6%
   rwsem/0:1        10          1:-76.8%, 5:-69.7%, 10:-64.8%
   rwsem/1:1        10          1:-89.6%, 5:-89.0%, 10:-85.8%
   rwsem/10:1       10          1:-78.6%, 5:-75.6%, 10:-62.2%
So at a very low level you see nicer results, which aren't really
translating to much of a significant impact at a higher level (aim7).

I was using a 4-socket system for testing. I believe the performance gain will be higher on larger machine. I will run some tests on those larger machine as well.
It can be seen that there is dramatic reduction in the execution
times. The new rwsem is now even faster than mutex whether it is all
writers or a mixture of writers and readers.

Running the AIM7 benchmarks on the same 40-core system (HT off),
the performance improvements on some of the workloads were as follows:

       Workload      Before Patch       After Patch     % Change
       --------      ------------       -----------     --------
   custom (200-1000)    446135            477404         +7.0%
   custom (1100-2000)   449665            484734         +7.8%
   high_systime         152437            154217         +1.2%
    (200-1000)
   high_systime         269695            278942         +3.4%
    (1100-2000)
I worry about complicating rwsems even _more_ than they are, specially
for such a marginal gain. You might want to try other workloads -- ie:
postgresql (pgbench), I normally get pretty useful data when dealing
with rwsems.


Thank for the info. I will try running pgbench as well.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to