Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation

Phil Yang (Arm Technology China) Thu, 06 Jun 2019 03:17:32 -0700

From: David Marchand <[email protected]>
Sent: Thursday, June 6, 2019 12:30 AM
To: Phil Yang (Arm Technology China) <[email protected]>
Cc: dev <[email protected]>; [email protected]; [email protected]; 
[email protected]; Honnappa Nagarahalli <[email protected]>; 
Gavin Hu (Arm Technology China) <[email protected]>; nd <[email protected]>
Subject: Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation




On Wed, Jun 5, 2019 at 6:00 PM Phil Yang 
<[email protected]<mailto:[email protected]>> wrote:
This patch set added MCS lock library and its unit test.

The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides
scalability by spinning on a CPU/thread local variable which avoids expensive
cache bouncings. It provides fairness by maintaining a list of acquirers and
passing the lock to each CPU/thread in the order they acquired the lock.

References:
1. 
http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
2. https://lwn.net/Articles/590243/

Mirco-benchmarking result:
------------------------------------------------------------------------------------------------
MCS lock                      | spinlock                       | ticket lock
------------------------------+--------------------------------+--------------------------------
Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test with 
lock on 14 cores...
Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core [14] 
cost time = 66761 us
Core [16] Cost Time = 22382 us|  Core [15] Cost Time = 46979 us|  Core [15] 
cost time = 66766 us
Core [17] Cost Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core [16] 
cost time = 66761 us
Core [18] Cost Time = 22412 us|  Core [17] Cost Time = 28793 us|  Core [17] 
cost time = 66767 us
Core [19] Cost Time = 22407 us|  Core [18] Cost Time = 48349 us|  Core [18] 
cost time = 66758 us
Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core [19] 
cost time = 66766 us
Core [21] Cost Time = 22414 us|  Core [20] Cost Time = 47914 us|  Core [20] 
cost time = 66763 us
Core [22] Cost Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core [21] 
cost time = 66766 us
Core [23] Cost Time = 22435 us|  Core [22] Cost Time = 38900 us|  Core [22] 
cost time = 66749 us
Core [24] Cost Time = 22401 us|  Core [23] Cost Time = 45374 us|  Core [23] 
cost time = 66765 us
Core [25] Cost Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core [24] 
cost time = 66762 us
Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|  Core [25] 
cost time = 66768 us
Core [27] Cost Time = 22395 us|  Core [26] Cost Time = 29439 us|  Core [26] 
cost time = 66768 us
                              |  Core [27] Cost Time = 38071 us|  Core [27] 
cost time = 66767 us
------------------------------+--------------------------------+--------------------------------
Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total cost 
time = 934687 us
------------------------------------------------------------------------------------------------

Had a quick look, interesting.

Hi David,

Thanks for your comments.

Quick comments:
- your numbers are for 13 cores, while the other are for 14, what is the reason?
[Phil]The test case skipped the master thread while doing the load test. The 
master thread just controls the trigger. So all the other threads acquiring the 
lock and running the same workload at the same time.
Actually, there is no difference on per core performance when it involved the 
master thread in the load test.

- do we need per architecture header? all I can see is generic code, we might 
as well directly put rte_mcslock.h in the common/include directory.
[Phil] I just trying to leave it for architecture specific optimization.

- could we replace the current spinlock with this approach? is this more 
expensive than spinlock on lowly contended locks? is there a reason we want to 
keep all these approaches? we could have now 3 lock implementations.
[Phil] Under the high lock contention scenarios, MCS is much better than 
spinlock. However, MCS lock is more complicated than spinlock and more 
expensive than spinlock in the single thread scenario. E.g:
Test with lock on single core..
MCS lock :
Core [14] Cost Time = 327 us

Spinlock:
Core [14] Cost Time = 258 us

ticket lock:
Core [14] cost time = 195 us
I think in low-contention scenarios but you still need mutual exclusion you can 
use spinlock. It is lighter. I think that all depends on the application.

- do we need to write the authors names in full capitalized version? it seems 
like you are shouting :-)
[Phil] :-)  I will modify it in the next version. Thanks.


--
David Marchand


Thanks,
Phil Yang

Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation

Reply via email to