Hi, The current wait events are already pretty useful. But I think we could make them more informative without adding real runtime overhead.
1) For lwlocks I think it'd be quite useful to show the mode of acquisition in pg_stat_activity.wait_event_type, instead of just saying 'LWLock'. I think we should split PG_WAIT_LWLOCK into PG_WAIT_LWLOCK_{EXCLUSIVE,SHARED,WAIT_UNTIL_FREE}, and report a different wait_event_type based on the different class. The fact that it'd break people explicitly looking for LWLock in pg_stat_activity doesn't seem to outweigh the benefits to me. 2) I think it's unhelpful that waits for WAL insertion locks to progress show up LWLock acquisitions. LWLockWaitForVar() feels like a distinct enough operation that passing in a user-specified wait event is worth the miniscule incremental overhead that'd add. I'd probably just make it a different wait class, and have xlog.c compute that based on the number of the slot being waited for. 3) I have observed waking up other processes as part of a lock release to be a significant performance factor. I would like to add a separate wait event type for that. That'd be a near trivial extension to 1) I also think there's a 4, but I think the tradeoffs are a bit more complicated: 4) For a few types of lwlock just knowing the tranche isn't sufficient. E.g. knowing whether it's one or different buffer mapping locks are being waited on is important to judge contention. For wait events right now we use 1 byte for the class, 1 byte is unused and 2 bytes are used for event specific information (the tranche in case of lwlocks). Seems like we could change the split to be a 4bit class and leave 28bit to the specific wait event type? And in lwlocks case we could make something like 4 bit class, 10 bit tranche, 20 bit sub-tranche? 20 bit aren't enough to uniquely identify a lock for the larger tranches (mostly buffer locks, I think), but I think it'd still be enough to disambiguate. The hardest part would be to know how to identify individual locks. The easiest would probably be to just mask in a parts of the lwlock address (e.g. shift it right by INTALIGN, and then mask in the result into the eventId). That seems a bit unsatisfying. We could probably do a bit better: We could just store the information about tranche / offset within tranche at LWLockInitialize() time, instead of computing something just before waiting. While LWLock.tranche is only 16bits right now, the following two bytes are currently padding... That'd allow us to have proper numerical identification for nearly all tranches, without needing to go back to the complexity of having tranches specify base & stride. Even more API churn around lwlock initialization isn't desirable :(, but we could just add a LWLockInitializeIdentified() or such. Greetings, Andres Freund