On 08/04/2014 12:25 AM, Davidlohr Bueso wrote:
On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:This patch set improves upon the rwsem optimistic spinning patch set from Davidlohr to enable better performing rwsem and more aggressive use of optimistic spinning.By using a microbenchmark running 1 million lock-unlock operations per thread on a 4-socket 40-core Westmere-EX x86-64 test machine running 3.16-rc7 based kernels, the following table shows the execution times with 2/10 threads running on different CPUs on the same socket where load is the number of pause instructions in the critical section: lock/r:w ratio # of threads Load:Execution Time (ms) -------------- ------------ ------------------------ mutex 2 1:530.7, 5:406.0, 10:472.7 mutex 10 1:1848 , 5:2046 , 10:4394 Before patch: rwsem/0:1 2 1:339.4, 5:368.9, 10:394.0 rwsem/1:1 2 1:2915 , 5:2621 , 10:2764 rwsem/10:1 2 1:891.2, 5:779.2, 10:827.2 rwsem/0:1 10 1:5618 , 5:5722 , 10:5683 rwsem/1:1 10 1:14562, 5:14561, 10:14770 rwsem/10:1 10 1:5914 , 5:5971 , 10:5912 After patch: rwsem/0:1 2 1:161.1, 5:244.4, 10:271.4 rwsem/1:1 2 1:188.8, 5:212.4, 10:312.9 rwsem/10:1 2 1:168.8, 5:179.5, 10:209.8 rwsem/0:1 10 1:1306 , 5:1733 , 10:1998 rwsem/1:1 10 1:1512 , 5:1602 , 10:2093 rwsem/10:1 10 1:1267 , 5:1458 , 10:2233 % Change: rwsem/0:1 2 1:-52.5%, 5:-33.7%, 10:-31.1% rwsem/1:1 2 1:-93.5%, 5:-91.9%, 10:-88.7% rwsem/10:1 2 1:-81.1%, 5:-77.0%, 10:-74.6% rwsem/0:1 10 1:-76.8%, 5:-69.7%, 10:-64.8% rwsem/1:1 10 1:-89.6%, 5:-89.0%, 10:-85.8% rwsem/10:1 10 1:-78.6%, 5:-75.6%, 10:-62.2%So at a very low level you see nicer results, which aren't really translating to much of a significant impact at a higher level (aim7).
I was using a 4-socket system for testing. I believe the performance gain will be higher on larger machine. I will run some tests on those larger machine as well.
It can be seen that there is dramatic reduction in the execution times. The new rwsem is now even faster than mutex whether it is all writers or a mixture of writers and readers. Running the AIM7 benchmarks on the same 40-core system (HT off), the performance improvements on some of the workloads were as follows: Workload Before Patch After Patch % Change -------- ------------ ----------- -------- custom (200-1000) 446135 477404 +7.0% custom (1100-2000) 449665 484734 +7.8% high_systime 152437 154217 +1.2% (200-1000) high_systime 269695 278942 +3.4% (1100-2000)I worry about complicating rwsems even _more_ than they are, specially for such a marginal gain. You might want to try other workloads -- ie: postgresql (pgbench), I normally get pretty useful data when dealing with rwsems.
Thank for the info. I will try running pgbench as well. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/