From: David Marchand <david.march...@redhat.com> Sent: Thursday, June 6, 2019 12:30 AM To: Phil Yang (Arm Technology China) <phil.y...@arm.com> Cc: dev <dev@dpdk.org>; tho...@monjalon.net; jer...@marvell.com; hemant.agra...@nxp.com; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology China) <gavin...@arm.com>; nd <n...@arm.com> Subject: Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
On Wed, Jun 5, 2019 at 6:00 PM Phil Yang <phil.y...@arm.com<mailto:phil.y...@arm.com>> wrote: This patch set added MCS lock library and its unit test. The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides scalability by spinning on a CPU/thread local variable which avoids expensive cache bouncings. It provides fairness by maintaining a list of acquirers and passing the lock to each CPU/thread in the order they acquired the lock. References: 1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf 2. https://lwn.net/Articles/590243/ Mirco-benchmarking result: ------------------------------------------------------------------------------------------------ MCS lock | spinlock | ticket lock ------------------------------+--------------------------------+-------------------------------- Test with lock on 13 cores... | Test with lock on 14 cores... | Test with lock on 14 cores... Core [15] Cost Time = 22426 us| Core [14] Cost Time = 47974 us| Core [14] cost time = 66761 us Core [16] Cost Time = 22382 us| Core [15] Cost Time = 46979 us| Core [15] cost time = 66766 us Core [17] Cost Time = 22294 us| Core [16] Cost Time = 46044 us| Core [16] cost time = 66761 us Core [18] Cost Time = 22412 us| Core [17] Cost Time = 28793 us| Core [17] cost time = 66767 us Core [19] Cost Time = 22407 us| Core [18] Cost Time = 48349 us| Core [18] cost time = 66758 us Core [20] Cost Time = 22436 us| Core [19] Cost Time = 19381 us| Core [19] cost time = 66766 us Core [21] Cost Time = 22414 us| Core [20] Cost Time = 47914 us| Core [20] cost time = 66763 us Core [22] Cost Time = 22405 us| Core [21] Cost Time = 48333 us| Core [21] cost time = 66766 us Core [23] Cost Time = 22435 us| Core [22] Cost Time = 38900 us| Core [22] cost time = 66749 us Core [24] Cost Time = 22401 us| Core [23] Cost Time = 45374 us| Core [23] cost time = 66765 us Core [25] Cost Time = 22408 us| Core [24] Cost Time = 16121 us| Core [24] cost time = 66762 us Core [26] Cost Time = 22380 us| Core [25] Cost Time = 42731 us| Core [25] cost time = 66768 us Core [27] Cost Time = 22395 us| Core [26] Cost Time = 29439 us| Core [26] cost time = 66768 us | Core [27] Cost Time = 38071 us| Core [27] cost time = 66767 us ------------------------------+--------------------------------+-------------------------------- Total Cost Time = 291195 us | Total Cost Time = 544403 us | Total cost time = 934687 us ------------------------------------------------------------------------------------------------ Had a quick look, interesting. Hi David, Thanks for your comments. Quick comments: - your numbers are for 13 cores, while the other are for 14, what is the reason? [Phil]The test case skipped the master thread while doing the load test. The master thread just controls the trigger. So all the other threads acquiring the lock and running the same workload at the same time. Actually, there is no difference on per core performance when it involved the master thread in the load test. - do we need per architecture header? all I can see is generic code, we might as well directly put rte_mcslock.h in the common/include directory. [Phil] I just trying to leave it for architecture specific optimization. - could we replace the current spinlock with this approach? is this more expensive than spinlock on lowly contended locks? is there a reason we want to keep all these approaches? we could have now 3 lock implementations. [Phil] Under the high lock contention scenarios, MCS is much better than spinlock. However, MCS lock is more complicated than spinlock and more expensive than spinlock in the single thread scenario. E.g: Test with lock on single core.. MCS lock : Core [14] Cost Time = 327 us Spinlock: Core [14] Cost Time = 258 us ticket lock: Core [14] cost time = 195 us I think in low-contention scenarios but you still need mutual exclusion you can use spinlock. It is lighter. I think that all depends on the application. - do we need to write the authors names in full capitalized version? it seems like you are shouting :-) [Phil] :-) I will modify it in the next version. Thanks. -- David Marchand Thanks, Phil Yang