I wrote: > So it seems it's time to start thinking about how to reduce contention > for the LockMgrLock. > ... > The best idea I've come up with after a bit of thought is to replace the > shared lock table with N independent tables representing partitions of the > lock space.
I've committed changes along this line. Testing with pgbench on a dual HT Xeon, I get numbers like this (for successive -c 10 -t 3000 runs after an -s 10 initialization): Previous CVS HEAD: tps = 1561.983651 (including connections establishing) tps = 1510.301236 (including connections establishing) tps = 1496.679616 (including connections establishing) With 4 partitions: tps = 1671.311892 (including connections establishing) tps = 1620.093917 (including connections establishing) tps = 1598.887515 (including connections establishing) With 16 partitions: tps = 1689.662504 (including connections establishing) tps = 1595.530388 (including connections establishing) tps = 1609.552501 (including connections establishing) CPU idle percentage according to "top" is around 5% for the previous HEAD, and around 2% for either of the partition cases. I didn't see any dropoff in CS rate however --- seemed to be around 35K in all cases. The TPS rates for a single client are the same to within measurement noise, so it seems we're not paying too much for the extra LWLockAcquire/Release cycles during LockReleaseAll. As you can see, there's not a lot of difference between the 4- and 16- partition numbers; this is probably because the OIDs assigned in pgbench's simplistic schema are such that the load is fairly evenly distributed across partitions in both cases. We need to test some other scenarios to see which size we should go with. (If you want to test, change NUM_LOCK_PARTITIONS in src/include/storage/lock.h, and be sure to recompile the whole backend because this affects the PGPROC struct.) I spent some time looking at the lock acquire/conflict counts using the same patch mentioned previously, and got some moderately interesting numbers. A representative value of the per-process counts for the single LockMgrLock was PID 12972 lwlock LockMgrLock: shacq 0 exacq 50204 blk 3243 In the old code, there were 15 predictable LockMgrLock acquisitions per pgbench transaction (for transaction and relation locks), or 45000 for the whole run; the majority of the other 5K acquisitions seem to be for RelationExtension locks, with a few hundred Tuple locks occurring due to update contention on rows of the "branches" table. With 4 lock partitions, a typical process shows PID 20471 lwlock 20: shacq 0 exacq 8809 blk 115 PID 20471 lwlock 21: shacq 0 exacq 10933 blk 245 PID 20471 lwlock 22: shacq 0 exacq 20267 blk 503 PID 20471 lwlock 23: shacq 0 exacq 17148 blk 404 TOTAL 57157 1267 and with 16: PID 13367 lwlock 20: shacq 0 exacq 679 blk 1 PID 13367 lwlock 21: shacq 0 exacq 648 blk 2 PID 13367 lwlock 22: shacq 0 exacq 665 blk 3 PID 13367 lwlock 23: shacq 0 exacq 12611 blk 262 PID 13367 lwlock 24: shacq 0 exacq 773 blk 3 PID 13367 lwlock 25: shacq 0 exacq 6715 blk 80 PID 13367 lwlock 26: shacq 0 exacq 781 blk 1 PID 13367 lwlock 27: shacq 0 exacq 6706 blk 89 PID 13367 lwlock 28: shacq 0 exacq 6507 blk 68 PID 13367 lwlock 29: shacq 0 exacq 731 blk 2 PID 13367 lwlock 30: shacq 0 exacq 9492 blk 170 PID 13367 lwlock 31: shacq 0 exacq 837 blk 3 PID 13367 lwlock 32: shacq 0 exacq 6530 blk 81 PID 13367 lwlock 33: shacq 0 exacq 717 blk 1 PID 13367 lwlock 34: shacq 0 exacq 6564 blk 74 PID 13367 lwlock 35: shacq 0 exacq 831 blk 0 TOTAL 61787 840 The increase in the total number of acquisitions happens because LockReleaseAll needs to touch several partitions during each transaction commit. There are seven relations in the test (4 tables, 3 indexes) and you can clearly see which partitions their locks fell into during the 16-way test. (Transaction and tuple locks will be pretty evenly spread across all the partitions, because those locktags change constantly.) We are getting a reduction in contention, as shown by the falling number of lock blockages, but we're paying for it with more lock acquisition cycles. Bottom line is that this seems to have been a useful improvement, but it didn't get us as far as I'd hoped. Any thoughts on other things to try? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq