I've completed a review of all of the LWlocking in the backends. This is documented in the enclosed file. I would propose that we use this as comments in lwlock.h or in the README, if people agree.
A number of points emerge from that analysis: 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining operation: XidCacheRemoveRunningXids(). Reducing things to that level is brilliant work, Florian and Tom. After analysis, I am still concerned because subxact abort could now be starved out by large number of shared holders, then when it is acquired we may experience starvation of shared requestors, as described in point (4) here: http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php I no longer want to solve it in the way described there, but have a solution described in a separate post on -hackers. The original solution still seems valid, but if we can solve it another way we should. 2. CountActiveBackends() searches the whole of the proc array, even though it could stop when it gets to commit_siblings. Stopping once the heuristic has been determined seems like the best thing to do. A small patch to implement this is attached. 3. ReceiveSharedInvalidMessages() takes a Shared lock on SInvalLock, then takes an Exclusive lock later in the same routine to perform SIDelExpiredDataEntries(). The latter routine examines data that it hasn't touched to see if it can delete anything. If it finds anything other than its own consumed message it will only be because it beat another backend in the race to delete a message it just consumed. So most callers of SIDelExpiredDataEntries() will do nothing at all, after having queued for an X lock. I can't see the sense in that, but maybe there is some deeper purpose? ISTM that we should only attempt to clean the queue when it fills, during SIInsertDataEntry(), which it already does. We want to avoid continually re-triggering postmaster signals, but we should do that anyway with a "yes-I-already-did-that" flag, rather than by eager cleaning of the queue, which just defers a postmaster signal storm, but does not prevent it. 4. WALWriteLock is acquired in Shared mode by bgwriter when it runs GetLastSegSwitchTime(). All other callers are Exclusive lockers, so the Shared request will queue like everybody else. WALWriteLock queue length can be long, so the bgwriter can get stuck for much longer than bgwriter_delay when it makes this call; this happens only when archive_timeout > 0 so probably has never shown up in any performance testing. XLogWrite takes info_lck also, so we can move the lastSegSwitchTime behind that lock instead. That way bgwriter need never wait on I/O, just spin for access to info_lck. Minor change. 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), but I can't find a caller of that anywhere in core or contrib. Can those now be removed? 6. David Strong talked about doing some testing to see if NUM_BUFFER_PARTITIONS should be increased above 16. We don't have any further information on that. Should we increase the value to 32 or 64? A minor increase seems safe and should provide the most gain without decreasing performance for lower numbers of CPUs. 7. VACUUM has many contention points within it, so HOT should avoid the annoyance of having to run VACUUM repeatedly on very small heavily-updated tables. I haven't further analysed the SLRU locks, since nothing much has changed there recently and they were already pretty efficient, IIRC. I'm working on patches for 1-4. We've moved far in recent weeks, so it seems like we should finish the job. Comments? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
BufFreelistLock, /* X - all - each time we allocate a new buffer for data block I/O * Never held across I/O */ ShmemIndexLock, /* X - all - create/attach to shared memory * Never held across I/O */ OidGenLock, /* X - all - each GetNewOid() and each GetNewRelFileNode() * S - bgwriter - acquired during checkpoint * Writes WAL record every 8192 OIDs, so vanishing chance * of being held across I/O */ XidGenLock, /* X - all - for each GetNewTransactionId() * check whether we need to call ExtendClog or ExtendSubtrans * could be held across I/O if clog or subtrans buffers * have a dirty LRU page * S - all - for each ReadNewTransactionId() *5 * called by GetNextXidAndEpoch(), * once per VACUUM of each relation * once per start of autovacuum worker * X - all - for each SetTransactionIdLimit() * called after each VACUUM of whole database and * at EOXact if we update catalogs and write relcache file * X - bgwriter - acquired during checkpoint */ ProcArrayLock, /* X - all - adding/removing procs from procarray * backend start or exit, two phase commits *1 * X - all - XidCacheRemoveRunningXids() * S - all - TransactionIdIsInProgress(), TransactionIdIsActive(), * GetOldestXmin(), GetSnapshotData(), GetTransactionsInCommit() * HaveTransactionsInCommit(), BackendPidGetProc(), * BackendXidGetProc(), GetCurrentVirtualXids(), * CountDBBackends(), CountUserBackends(), *2 * CheckOtherDBBackends(), CountActiveBackends() */ SInvalLock, /* X - all - backend startup or exit * X - all - send SInval message * S - all - receive SInval message *3 * X - all - release dead SInval messages */ FreeSpaceLock, /* X - access to the FSM to reuse a block, record freespace * X - held during VACUUM to record free space, maybe rearrange FSM * Never held across I/O, except at database startup/shutdown */ WALInsertLock, /* X - insert data into WAL buffers * Holder may acquire WALWriteLock if WAL buffers full */ WALWriteLock, /* X - any - write WAL buffers to disk - Always held across I/O *4 * S - bgwriter - each loop checks GetLastSegSwitchTime() * Holder conditionally acquiresmay WALInsertLock to perform * piggyback I/O on WAL */ ControlFileLock, /* X - any - must be held to read/write from Control file * Always held across I/O */ CheckpointLock, /* X - bgwriter - must be held to perform CreateCheckpoint * Holder always acquires WALInsertLock, XidGenLock, OidGenLock, * ProcArrayLock and ControlFileLock */ CLogControlLock, SubtransControlLock, MultiXactGenLock, MultiXactOffsetControlLock, MultiXactMemberControlLock, /* SLRU locks */ RelCacheInitLock, BgWriterCommLock, TwoPhaseStateLock, TablespaceCreateLock, BtreeVacuumLock, AddinShmemInitLock, AutovacuumLock, AutovacuumScheduleLock, SyncScanLock, /* X - any - once per large SeqScan, plus conditionally once * per ~16 blocks, during ss_report_location() */ /* Individual lock IDs end here */ FirstBufMappingLock, *6 FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS, /* must be last except for MaxDynamicLWLock: */ NumFixedLWLocks = FirstLockMgrLock + NUM_LOCK_PARTITIONS, MaxDynamicLWLock = 1000000000
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org