Hi, hackers! > I'm planning to gather more detailed statistics on different > LWLockAcquire calls soon to understand prospects for further > optimizations.
So, I've made more measurements. 1. Applied measuring patch 0001 to a patch with lockless queue optimization (v2) from [0] in this thread and run the same concurrent insert test described in [1] on 20 pgbench connections. The new results for ProcArray lwlock are as follows: exacq 45132 // Overall number of exclusive locks taken ex_attempt[0] 20755 // Exclusive locks taken immediately ex_attempt[1] 18800 // Exclusive locks taken after one waiting on semaphore ex_attempt[2] 5577 // Exclusive locks taken after two or more waiting on semaphore shacq 494871 // .. same stats for shared locks sh_attempt[0] 463211 // .. sh_attempt[1] 29767 // .. sh_attempt[2] 1893 // .. same stats for shared locks sh_wake_calls 31070 // Number of calls to wake up shared waiters sh_wakes 36190 // Number of shared waiters woken up. GroupClearXid 55300 // Number of calls of ProcArrayGroupClearXid EndTransactionInternal: 236193 // Number of calls ProcArrayEndTransactionInternal 2. Applied measuring patch 0002 to a Andres Freund's patch v3 from [2] and run the same concurrent insert test described in [1] on 20 pgbench connections. The results for ProcArray lwlock are as follows: exacq 49300 // Overall number of exclusive locks taken ex_attempt1[0] 18353 // Exclusive locks taken immediately by first call of LWLockAttemptLock in LWLockAcquire loop ex_attempt2[0] 18144. // Exclusive locks taken immediately by second call of LWLockAttemptLock in LWLockAcquire loop ex_attempt1[1] 9985 // Exclusive locks taken after one waiting on semaphore by first call of LWLockAttemptLock in LWLockAcquire loop ex_attempt2[1] 1838. // Exclusive locks taken after one waiting on semaphore by second call of LWLockAttemptLock in LWLockAcquire loop ex_attempt1[2] 823. // Exclusive locks taken after two or more waiting on semaphore by first call of LWLockAttemptLock in LWLockAcquire loop ex_attempt2[2] 157. // Exclusive locks taken after two or more waiting on semaphore by second call of LWLockAttemptLock in LWLockAcquire loop shacq 508131 // .. same stats for shared locks sh_attempt1[0] 469410 //.. sh_attempt2[0] 27858. //.. sh_attempt1[1] 10309. //.. sh_attempt2[1] 460. //.. sh_attempt1[2] 90. //.. sh_attempt2[2] 4. // .. same stats for shared locks dequeue self 48461 // Number of dequeue_self calls sh_wake_calls 27560 // Number of calls to wake up shared waiters sh_wakes 19408 // Number of shared waiters woken up. GroupClearXid 65021. // Number of calls of ProcArrayGroupClearXid EndTransactionInternal: 249003 // Number of calls ProcArrayEndTransactionInternal It seems that two calls in each look in Andres's (and master) code help evade semaphore-waiting loops that may be relatively expensive. The probable reason for this is that the small delay between these two calls is sometimes enough for concurrent takers to free spinlock for the queue modification. Could we get even more performance if we do three or more tries to take the lock in the queue? Will this degrade performance in some other cases? Or maybe there is another explanation for now small performance difference around 20 connections described in [0]? Thoughts? Regards, Pavel Borisov [0] https://www.postgresql.org/message-id/CALT9ZEF7q%2BSarz1MjrX-fM7OsoU7CK16%3DONoGCOkY3Efj%2BGrnw%40mail.gmail.com [1] https://www.postgresql.org/message-id/CALT9ZEEz%2B%3DNepc5eti6x531q64Z6%2BDxtP3h-h_8O5HDdtkJcPw%40mail.gmail.com [2] https://www.postgresql.org/message-id/20221031235114.ftjkife57zil7ryw%40awork3.anarazel.de
0001-Print-extended-lwlock_stats-and-proc_stats-on-CAS-re.patch
Description: Binary data
0002-Print-extended-lwlock_stats-and-proc_stats.patch
Description: Binary data